The aim of the project is to investigate user behavior for the company's app (a startup that sells food products).
First we will preprocess the data and study the sales funnel:
Next we will check the results of an A/A/B test: if change of the fonts for the entire app have any impact on users. We will pick up the equal number of users for all 3 groups and run a statistical test to check if there is a statistically significant difference in number of users for any of the events. </br>
Part 1. Data preprocessing
Part 2. Data analysis
How many events are in the logs
How many users are in the logs
What's the average number of events per user
What period of time does the data cover.
Did you lose many events and users when excluding the older data
Part3. Event funnel
What events are in the logs and their frequency of occurrence
Number of users who performed each of these actions. Proportion of users who performed the action at least once.
In what order actions took place. Are all of them part of a single sequence?
Event funnel and share of users that proceed from each stage to the next. Ratio of users at stage B to the number of users at stage A and the ratio of users at stage C to the number at stage B.
Part 4. Study of the results of experiment
How many users are there in each group
A/A test. Checking a statistically significant difference between samples 246 and 247
A/A/B test
General Conclusion
import pandas as pd
from IPython.display import display
from datetime import datetime
import plotly.express as px
from plotly import graph_objects as go
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats as st
import seaborn as sns
import sys
import warnings
if not sys.warnoptions:
warnings.simplefilter("ignore")
import math
#adjustments of the path for the Practicum platform:
df = pd.read_csv('./logs_exp_us.csv', delim_whitespace=True)
display(df.head())
display(df.sample(6))
df.info(memory_usage='deep')
#checking the object-type column if it can be turned to a category one.
df['EventName'].value_counts()
df['EventName'] = df['EventName'].astype('category')
#timestamp column needs to be converted, getting rid of unix time format
df['EventTimestamp'] = pd.to_datetime(df['EventTimestamp'],unit='s')
display(df['EventTimestamp'])
#re-cheking memory usage
df.info(memory_usage='deep')
The memory usage has dropped significatly: from 16.4 MB to 5.8 MB.
df['event_date'] = df['EventTimestamp'].dt.date
df['event_time'] = df['EventTimestamp'].dt.time
df.sample(6)
df['event_date'] = pd.to_datetime(df['event_date'], format='%Y-%m-%d')
#df['event_time'] = pd.to_datetime(df['event_time'], format='%H:%M:%S')
#renaming columns
df.columns = ['event_name' ,'user_id', 'timestamp', 'experiment_number', 'event_date', 'event_time']
df.sample(6)
#Checking for duplicated data
df.duplicated().sum()
#checking what kind of duplicates we have
duplicated_rows = df[df.duplicated()]
duplicated_rows.sample(15)
#checking the dates when the data was duplicated:
duplicated_rows['timestamp'].dt.date.unique()
So far we cannot see any pattern in duplicates. There are data on different events, dates (from 30.07.2019 till 7.08.2019), all experiment groups.
#checking area of the problem caused duplicated data:
for i in duplicated_rows:
print(i, ":", duplicated_rows[i].nunique())
The server had some problem duplicated data during all 5 events on 237 unique users with 352 timestamps in all 3 experiment groups.
Duplicated data is only 0.17% of the whole data base (413/244126) thus we can drop it without skewing the analysis.
#getting rid of duplicated data
df = df.drop_duplicates().reset_index(drop = True)
#Checking data for zeros:
for i in df.columns:
print(i, len(df[df[i]==0]))
# checking the number of experiments in every group:
df['experiment_number'].value_counts()
#statistical summary for categorical variables
df.describe(include=['category'])
#Identifing the number of unique users:
df['user_id'].value_counts().count()
df['user_id'].nunique()
df.shape
The data for analysis contains 4 rows with 244126 entries; data on 5 event types for 7551 unique users.
There are 5 unique EventName originally marked as object type. For the sake of memory usage we have converted it to a category type. "MainScreenAppear " is the most frequently appearing category (119205 ot of 244126 ).
Another type conversion has been proceeded with EventTimestamp. We have converted unix time format to datetime and created two separate columns with date of event and time of event.
Changing data type has allowed us to to improve the memory usage ( memory usage has dropped significatly: from 16.4 MB to 5.8 MB).
There were 413 duplicates (0.17%) in the data set that have been successfully dropped.
No missing values have been found in data set.
We have renamed the columns using clear names without upper case letters.
Resulting data set has 6 columns with 243713 rows.
As we have just figured it out in the previous part see here, there are 5 types of events in the data/funnel. Below is the graphical interpretation of the data.
#creating a table
df_events = df.groupby(['event_name']).agg({'user_id': 'count'}).reset_index()
#renaming columns
df_events.columns = ['event_name' ,'number']
df_events['event_name'] = df_events['event_name'].astype(str)
df_events.sort_values(by='number', ascending=False)
plt.figure(figsize=(12,5))
# plot barh chart
ax = sns.barplot(df_events.event_name, df_events.number, palette='mako')
ax.get_yaxis().set_major_formatter(plt.FuncFormatter(lambda x, loc: "{:,}".format(int(x))))
ax.set(xlabel="events", ylabel='users')
ax.set_xticklabels(df_events.event_name)
for item in ax.get_xticklabels(): item.set_rotation(0)
for i, v in enumerate(df_events["number"].iteritems()):
ax.text(i ,v[1], "{:,}".format(v[1]), color='k', va ='bottom', rotation=0)
plt.title('Number of different types of events')
plt.show()
#pie chart for types of events
colors = ['gold', 'mediumturquoise', 'darkorange', 'lightgreen', 'blue']
labels = df_events['event_name']
values = df_events['number']
fig = go.Figure(data=[go.Pie(labels=labels, values=values)])
fig.update_layout(title_text='Shares of different types of events')
fig.update_traces(hoverinfo='label+percent', textfont_size=20,
marker=dict(colors=colors, line=dict(color='#000000', width=2)))
fig.show()
We can see that almost half of the events are MainScreen; OfferScreen and Cart screen have almost equal shares of 19-17.5%. The least frequent event is watching tutotials. Either they do not have any value to the customers or it is hard to find.
df['user_id'].nunique()
print('There are', df['user_id'].nunique(), 'unique users in the data set.')
#calculating events per user
df_events_users = df.groupby(['user_id']).agg({'event_name': 'count'}).sort_values(by='event_name', ascending=False).reset_index()
df_events_users
#checking up how many unique events the unique users performed:
unique_events_per_users = df.groupby(['user_id'])['event_name'].nunique().reset_index().groupby(['event_name'])['user_id'].nunique()
unique_events_per_users
Only 471 user had all 5 events.
df_events_users[['event_name']].describe()
#poltting distribution
plt.figure(figsize=(9,5))
sns.distplot(df_events_users['event_name'])
plt.title('number of events per user')
plt.show()
# box-and-whisker plot
plt.figure(figsize=(9,5))
ax = sns.boxplot(x="event_name", data=df_events_users)
#Calculatinug the 95th and 99th percentiles for the number of events (setting out outliers).
np.percentile(df_events_users['event_name'], [95, 99])
The number of events per user vary from 1 to 2307. Middle 50% of users had from 9 to 37 events.
Not more than 5% of users had over 89 events and only 1% of chains had over 200 events. Thus users with over 200 events are outliers for our distribution.
Mean value is 32 events, median value is 20. As we have positively skewed distribution with a lot of outliers to the right, we will deem median value as an average value for the distribution. Thus average number of events per user is 20.
df[['event_date', 'event_time']].describe()
df['timestamp'].min()
df['timestamp'].max()
The time covers period of 14 days starting from 2019-07-25 04:43:36 to 2019-08-07 21:15:17.
dates_views = df.groupby(['event_date']).agg({'user_id':'count'}).reset_index()
dates_views.columns = ['event_date' ,'events_number']
dates_views.reset_index(drop=True)
dates_views.plot(x='event_date', y='events_number', kind='bar', grid=True, figsize=(20,5), cmap='Set2', title='Number of Events per Date')
plt.show()
fig = px.area(dates_views, x='event_date', y="events_number", title='Number of Events per Date')
fig.show()
#slicing the data
df_complete = df.query('event_date>="2019-08-01"')
df_complete['timestamp'].min()
df_complete['timestamp'].max()
Despite the fact that we have data starting from 25/07/2019 it is clear that the data is incomplete till 01/08/2019. That might have happened due to some technical reasons. Thus we have to slice the data base respectively. The actual data represents period of one week, from 2019-08-01 00:07:28 till 2019-08-07 21:15:17.
df_excluded = df.query('event_date<"2019-08-01"')
df_excluded.info()
df_excluded['user_id'].nunique()
In course of slicing we have lost 2826 entries(which is 1.16 % of the data base) and 1451 unique users (which is 19% of the data base).
#checking the groups:
df_complete.experiment_number.value_counts()
#checking the events:
df_complete.event_name.unique()
We still have users from all of 3 experiment groups as well as all the events in the filtered data set.
There are 5 types of events in the data set: almost half of the logs was on the MainScreen, then go OfferScreen and Cart screen with slightly less than 20% each, PaymentScreen got 14% of logs and the least popular type is Tutorial with only 0.4% of logs.
Number events per user vary greatly, from 1 to 2307; middle 50% of users had from 9 to 37 events; the average number of events per user is 20.
There are logs from 2019-07-25 to 2019-08-07 in the data set, but study of data distribution shows that the logs data is incomplete till 2019-07-31 inclusively. Thus we had to drop the data on logs till 01/08/2019. Due to the slice we have lost 1.16 % of enties (2826 entries) and 19% of unique users (1451 users IDs out of 7551 unique users ).
This shows that the data with unique users was not evenly distributed. Some technical issues with users' logs could account for that.
The filtered data still has users from all three experimental groups.
[back to top](#top)
[previous part](#7)
We will define frequency of every event type.
events = df_complete.groupby('event_name').agg({'user_id': ['count']}).reset_index()
events.columns = ['event_name','frequency'] #renaming columns
events = events.sort_values(by='frequency', ascending=False)
events.head(10)
fig, ax = plt.subplots(figsize=(12, 7))
ax.set_title(' frequency of events',fontsize=16)
plt.xlabel('Type of event',fontsize=12)
plt.ylabel('frequency',fontsize=12)
colors = ['lightblue', 'mediumturquoise', 'orange', 'lightgreen', 'plum', 'bisque','lavender',
'lightcyan','palevioletred']
#plot
g = plt.bar(events.event_name, events.frequency,
0.7, label='Events', color=colors[1], alpha=0.8, edgecolor='black',linewidth=2)
#annotation
bar_label = events.frequency.tolist() #values for text
bar_label = [str(label) for label in bar_label]
def autolabel(rects):
for idx,rect in enumerate(g):
height = rect.get_height()
ax.text(rect.get_x() + rect.get_width()/2., 1.02*height,
bar_label[idx],
ha='center', va='bottom', rotation=0, size=15)
autolabel(g)
plt.legend()
plt.show()
#pie chart for types of events
colors = ['lightblue', 'mediumturquoise', 'orange', 'lightgreen', 'plum', 'bisque','lavender',
'lightcyan','palevioletred']
labels = events['event_name']
values = events['frequency']
fig = go.Figure(data=[go.Pie(labels=labels, values=values)])
fig.update_layout(title_text='Shares of different types of events')
fig.update_traces(hoverinfo='label+text+value', textfont_size=20,
marker=dict(colors=colors, line=dict(color='#000000', width=2)))
fig.show()
According to our data there are 5 events, the most frequent one is MainScreen, it is followd by OfferScreen and CartScreen(19% and 18%), PaymentScreen is 14% and the least popular is Tutorial event (0.4%).
Firstly we will group the data by users ID and define the number of actions for every event for each user.
Then we will find the number of users who had each of these events by dropping those who have NaNs in any column.
df_complete['event_name'] = df_complete['event_name'].astype(str)
users_per_events = df_complete.pivot_table(index='user_id',columns='event_name',values='timestamp',aggfunc='count').reset_index()
users_per_events
#Picking up the users that had all the events
users_all_events = users_per_events[users_per_events.notnull().all(axis=1)].reset_index(drop=True)
users_all_events
users_all_events.shape
#picking up all the users IDs that prformed all the actions and saving them in a separate list
user_id_all_events = users_all_events['user_id'].tolist()
user_id_all_events
#number of unique users that performed all the events besides Tutorial.
users_all_events_without_tutorial =users_per_events.loc[:, users_per_events.columns != 'Tutorial']
users_all_events_without_tutorial = users_all_events_without_tutorial[users_all_events_without_tutorial.notnull().all(axis=1)].reset_index(drop=True)
users_all_events_without_tutorial.shape
This way we have received a table with user IDs that performed all the events, and a list of IDs of those users.
There are 466 users out of 7534 that performed all the events.
#Calculating number of sessions for each event
sums = users_all_events.select_dtypes(np.float).sum().rename('total')
# append sums to the data frame
events_users_total = users_all_events.append(sums)
events_users_total.tail(1)
MainScreenAppear and OffersScreenAppear have almost the same number of sessions, slightly less users got to CartScreen. Tutorial screen had the least number of sessions.
We will group the data base by event name and calculate the number of unique users for each event.
#filtering out the data base containing only users that completed all events.
events_all = df_complete[df_complete['user_id'].isin(user_id_all_events)]
events_all.head()
events_unique_users = df_complete.groupby('event_name').agg({'user_id': ['nunique']}).reset_index()
events_unique_users.columns = ['event_name','users_number'] #renaming columns
events_unique_users = events_unique_users.sort_values(by='users_number', ascending=False)
events_unique_users.head(10)
Visualising the calculations.
plt.figure(figsize=(12,5))
# plot barh chart
ax = sns.barplot(events_unique_users.event_name, events_unique_users.users_number, palette='mako')
ax.get_yaxis().set_major_formatter(plt.FuncFormatter(lambda x, loc: "{:,}".format(int(x))))
ax.set(xlabel="events", ylabel='users')
ax.set_xticklabels(events_unique_users.event_name)
for item in ax.get_xticklabels(): item.set_rotation(0)
for i, v in enumerate(events_unique_users["users_number"].iteritems()):
ax.text(i ,v[1], "{:,}".format(v[1]), color='k', va ='bottom', rotation=0)
plt.title('Unique users per event')
plt.show()
The bar chart illustrates that the biggest amount of unique users performed 'MainScreenAppear', such events as 'OffersScreenAppear', 'CartScreenAppear' and 'PaymentScreenSuccessful' ah sapproximately the same amount of unique users (from 3,500 to 4,500 users). And the least popular event was 'Tutorial' with only 840 unique users visits.
events_at_least_once = df_complete.groupby('event_name').agg({'user_id': ['nunique', 'count']}).reset_index()
events_at_least_once.columns = ['event_name','users_unique', 'total_users'] #renaming columns
events_at_least_once.head(10)
#calculating proportion of users who who performed the action at least once:
#share of users that performed event at least once in comparison to total number of users that performed the event:
events_at_least_once['users_once_proportion'] =(
events_at_least_once['users_unique']
/ events_at_least_once['total_users']).round(2)
#share of users that performed event at least once per each event
events_at_least_once['users_unique_proportion'] = (
events_at_least_once['users_unique']
/df_complete['user_id'].nunique()).round(2) #corrected line
events_at_least_once = events_at_least_once.sort_values(by='users_once_proportion', ascending=False)
events_at_least_once
Visualising the calculations.
plt.figure(figsize=(12,5))
# plot barh chart
ax = sns.barplot(events_at_least_once.event_name, events_at_least_once.users_once_proportion, palette='mako')
ax.get_yaxis().set_major_formatter(plt.FuncFormatter(lambda x, loc: "{:,}".format(int(x))))
ax.set(xlabel="events", ylabel='Percentage of users')
ax.set_xticklabels(events_at_least_once.event_name)
for item in ax.get_xticklabels(): item.set_rotation(0)
for i, v in enumerate(events_at_least_once["users_once_proportion"].iteritems()):
ax.text(i ,v[1], "{:.1f}%".format(v[1]*100), color='k', va ='bottom', rotation=0)
plt.title('Percentage of users that performed the action at least once (to total number of users that performed the event)')
plt.show()
plt.figure(figsize=(12,5))
# plot barh chart
ax = sns.barplot(events_at_least_once.event_name, events_at_least_once.users_unique_proportion, palette='mako')
ax.get_yaxis().set_major_formatter(plt.FuncFormatter(lambda x, loc: "{:,}".format(int(x))))
ax.set(xlabel="events", ylabel='Percentage of users')
ax.set_xticklabels(events_at_least_once.event_name)
for item in ax.get_xticklabels(): item.set_rotation(0)
for i, v in enumerate(events_at_least_once["users_unique_proportion"].iteritems()):
ax.text(i ,v[1], "{:.1f}%".format(v[1]*100), color='k', va ='bottom', rotation=0)
plt.title('Percentage of users that performed the action at least once (to total number of once performed events)')
plt.show()
Among the events that performed at least once MainScreen is the leader with 37%.Then go the Offer, Payment and Cart events with almost equal shares (18-23%) and Tutorial event is the least popular with only 4%.
84% of all users that got to the Tutorial screen viewed it once. That means that only 16% of users came back to it.
To the contrary, only 2% of users that got to the MainScreen never came back, all the others (98%) viewed this screen more than once. OfferScreen, PaymentScreen and CartScreen were viewed once by 50-60% of all users, thus 40-50% of users came back to this screens more than once.
Now we will define the proportion of users who performed every action at least once.
Firstly, we will define how many users performed the action once, then we will calculate the total number of users that performed every action and finally we will state the proportion.
#comning back to the the table of events/sessions per user
users_per_events
#number of users who performed the action CartScreenAppear once
Users_one_CartScreenAppear = users_per_events.query('CartScreenAppear==1.0')['user_id'].count()
Users_one_CartScreenAppear
#total number of users who performed the action CartScreenAppear
Users_all_CartScreenAppear = users_per_events.query('CartScreenAppear>=1.0')['user_id'].count()
Users_all_CartScreenAppear
#number of users who performed the action MainScreenAppear once
Users_one_MainScreenAppear = users_per_events.query('MainScreenAppear==1.0')['user_id'].count()
Users_one_MainScreenAppear
Users_all_MainScreenAppear = users_per_events.query('MainScreenAppear>=1.0')['user_id'].count()
Users_all_MainScreenAppear
#number of users who performed the action OffersScreenAppear once
Users_one_OffersScreenAppear = users_per_events.query('OffersScreenAppear==1.0')['user_id'].count()
Users_one_OffersScreenAppear
Users_all_OffersScreenAppear = users_per_events.query('OffersScreenAppear>=1.0')['user_id'].count()
Users_all_OffersScreenAppear
#number of users who performed the action PaymentScreenSuccessful once
Users_one_PaymentScreenSuccessful = users_per_events.query('PaymentScreenSuccessful==1.0')['user_id'].count()
Users_one_PaymentScreenSuccessful
Users_all_PaymentScreenSuccessful = users_per_events.query('PaymentScreenSuccessful>=1.0')['user_id'].count()
Users_all_PaymentScreenSuccessful
#number of users who performed the action Tutorial once
Users_one_Tutorial = users_per_events.query('Tutorial==1.0')['user_id'].count()
Users_one_Tutorial
#total number of users who performed the action Tutorial
Users_all_Tutorial = users_per_events.query('Tutorial>=1.0')['user_id'].count()
Users_all_Tutorial
Now we will form a table with the received data.
lst = [['MainScreenAppear', Users_one_MainScreenAppear, Users_all_MainScreenAppear],
['OffersScreenAppear', Users_one_OffersScreenAppear, Users_all_OffersScreenAppear],
['CartScreenAppear', Users_one_CartScreenAppear, Users_all_CartScreenAppear],
['PaymentScreenSuccessful', Users_one_PaymentScreenSuccessful, Users_all_PaymentScreenSuccessful],
['Tutorial', Users_one_Tutorial, Users_all_Tutorial]]
users_events_proportion = pd.DataFrame(lst, columns =['events', 'users_one_event', 'all_users_event'])
users_events_proportion
#calculating proportion of users who who performed the action once:
users_events_proportion['users_once_proportion'] =users_events_proportion['users_one_event'] / users_events_proportion['all_users_event']
users_events_proportion['users_once_proportion'] = users_events_proportion['users_once_proportion'].round(2)
users_events_proportion
Visualising the calculations.
plt.figure(figsize=(12,5))
# plot barh chart
ax = sns.barplot(users_events_proportion.events, users_events_proportion.users_once_proportion, palette='mako')
ax.get_yaxis().set_major_formatter(plt.FuncFormatter(lambda x, loc: "{:,}".format(int(x))))
ax.set(xlabel="events", ylabel='Percentage of users')
ax.set_xticklabels(users_events_proportion.events)
for item in ax.get_xticklabels(): item.set_rotation(0)
for i, v in enumerate(users_events_proportion["users_once_proportion"].iteritems()):
ax.text(i ,v[1], "{:.1f}%".format(v[1]*100), color='k', va ='bottom', rotation=0)
plt.title('Percentage of users that performed the action only once')
plt.show()
The chart shows that the least popular category is still Tutorial, 90% of users never come back to this page. The most popular page is MainScreen, only 3% of users saw it once and did not come back to it again.
We suppose that for most of the users the funnel looks as follows: 'MainScreenAppear', 'OffersScreenAppear', 'CartScreenAppear', 'PaymentScreenSuccessful'.
'Tutorial' event is an annecessary step in this process and it can take place at avery stage or can not take place at all. We will not take it into account when calculating the funnel.
(Calculating the funnel for the sequence of events MainScreenAppear → OffersScreenAppear → CartScreenAppear → PaymentScreenSuccessful and the ratio of users at stage B to the number of users at stage A and the ratio of users at stage C to the number at stage B.)
#table for calculating funnel
users = df_complete.query('event_name!="Tutorial"').pivot_table(
index='user_id',
columns='event_name',
values=['timestamp'],
aggfunc='min').reset_index()
users.columns = ['user_id', 'CartScreenAppear', 'MainScreenAppear', 'OffersScreenAppear', 'PaymentScreenSuccessful']
users.head()
#calculating number of users on each step of the funnel according to the time when the action took place
step_1 = ~users['MainScreenAppear'].isna()
step_2 = step_1 & (users['OffersScreenAppear'] > users['MainScreenAppear'])
step_3 = step_2 & (users['CartScreenAppear'] > users['OffersScreenAppear'])
step_4 = step_3 & (users['PaymentScreenSuccessful'] > users['CartScreenAppear'])
n_MainScreen = users[step_1].shape[0]
n_OffersScreen = users[step_2].shape[0]
n_CartScreen = users[step_3].shape[0]
n_PaymentScreen = users[step_4].shape[0]
print('Visitors:', n_MainScreen)
print('Viewed the offer:', n_OffersScreen)
print('Added product to a cart:', n_CartScreen)
print('Paid:', n_PaymentScreen)
In order to calculat funnel metrics and to build it we will use the Funnel() method. We will buit it on basis of the previous calculatuons (table events_unique_users).
#forming a table for the funnel chart
list1 = [['MainScreenAppear', n_MainScreen],
['OffersScreenAppear', n_OffersScreen],
['CartScreenAppear', n_CartScreen],
['PaymentScreenSuccessful', n_PaymentScreen]]
events_unique_users_funnel = pd.DataFrame(list1, columns =['events', 'users_number'])
events_unique_users_funnel
#calculating ratio of the users to the previous stage:
events_unique_users_funnel['ratio']= (events_unique_users_funnel['users_number']
.div(events_unique_users_funnel['users_number'].shift(1))
.round(2)
)
events_unique_users_funnel.fillna(0)
events_unique_users_funnel['percentage_ch'] = events_unique_users_funnel['users_number'].pct_change()*100
events_unique_users_funnel.fillna(0)
To check the funnel
plt.figure(figsize=(12,5))
# plot barh chart
ax = sns.barplot(events_unique_users_funnel.events, events_unique_users_funnel.percentage_ch, palette='mako')
ax.get_yaxis().set_major_formatter(plt.FuncFormatter(lambda x, loc: "{:,}".format(int(x))))
ax.set(xlabel="events", ylabel='Percentage of users')
ax.set_xticklabels(events_unique_users_funnel.events)
for item in ax.get_xticklabels(): item.set_rotation(0)
for i, v in enumerate(events_unique_users_funnel["percentage_ch"].iteritems()):
ax.text(i ,v[1], "{:.1f}%".format(v[1]), color='k', va ='bottom', rotation=0)
plt.title('Bar chart for events funnel (percentage of user losts on every stage)')
plt.show()
We can see that the share of users that do not make it to the next stage of the funnel is getting bigger with every step of the funnel, with the biggest losts on the last stage (74% of users that get to cart screen do not complete the payment).
#building the funnel
fig = go.Figure(go.Funnel(y = events_unique_users_funnel.events, x = events_unique_users_funnel.users_number,
opacity = 0.85, marker = {"color": ['#333160', '#5b5895', '#3ac7c3' , '#c6f7e7']}))
fig.show()
The funnel chart illustrates the fact that the smallest percentage of users get lost on the very first step, 57% of users that got to the MainScreen will get to the OffersScreen.
On the next step 58% do not get to the CartScreen and get lost. And losts on the last step are the biggest, just 26% of users that got to CartScreen make successful payment.
This way 6% of users that got to the MainScreen will make a purchase.
There are 5 events forming the funnel, the most frequent one is MainScreen, it is followd by OfferScreen and CartScreen(19% and 18%), PaymentScreen is 14% and the least popular is Tutorial event (0.4%).
There are only 466 unique users that went through all of the events ( out of 7534 users, 6.2%). If we omite Tutorial event as the least popular one, the number of users that performed all the events will be much higher, 3429 users.
If we count number of sessions per event we will see that MainScreen event and OffersScreen have very close number of sessions (7400-7500), slightly less sessdions are there for Cart event (6400) and even less seeeions are there for the Payment event (4300).
As to events that have been performed at least once MainScreen is the leader, 37% of users viewd this creen at least once. Then go the Offer, Payment and Cart events with almost equal shares (18-23%) and Tutorial event is the least popular with only 4% of users.
84% of all users that got to the Tutorial screen viewed it once. That means that only 16% of users came back to it.
To the contrary, only 6% of users that got to the MainScreen never came back, all the others (94%) viewed this screen more than once. OfferScreen, PaymentScreen and CartScreen were viwed once by 9-10% of all users, thus 90-91% of users came back to this screens more than once.
If we calculate the number of users that performed the event only once, we will see the least popular category is still Tutorial, 90% of users never come back to this page. The most popular page is MainScreen, only 3% of users saw it once and did not come back to it again.
Next we have calculated the funnel for the sequence of events MainScreenAppear → OffersScreenAppear → CartScreenAppear → PaymentScreenSuccessful as Tutorial obviously is not part of the main funnel.
We have got the following numbers: MainScreen visitors: 7419, viewed the offer page: 4201, added product to a cart: 1767, paid: 454. This way we got 6% conversion rate for this funnel. Other users might get converted going through another steps, because general number of users that got to Payment page is higher (3539 unique users).
The biggest share of users got lost on the stage of payment : only 26% of users that put the products to the cart eventually pay for them. This stage should be studed more thoroughly.
Possible issues to improve:
#calculating the number of unique users for each group
experiment_groups = df_complete.groupby('experiment_number').agg({'user_id': 'nunique'}).reset_index()
experiment_groups
plt.figure(figsize=(9,5))
# plot barh chart
ax = sns.barplot(experiment_groups.experiment_number, experiment_groups.user_id, palette='mako')
ax.get_yaxis().set_major_formatter(plt.FuncFormatter(lambda x, loc: "{:,}".format(int(x))))
ax.set(xlabel="experiment no.", ylabel='Number of users')
ax.set_xticklabels(experiment_groups.experiment_number)
for item in ax.get_xticklabels(): item.set_rotation(0)
for i, v in enumerate(experiment_groups["user_id"].iteritems()):
ax.text(i ,v[1], "{:,}".format(v[1]), color='k', va ='bottom', rotation=0)
plt.title('Number of users in each group')
plt.show()
We have 3 groups with almost equal number of users. Still there is a difference that might skew the further results. We will pick up 2484 users randomly with sample() method.
#checking criteria for A/A tests (The number of users in different groups doesn't vary by more than 1%)
experiment_groupsAA = experiment_groups[experiment_groups.experiment_number!=248]
experiment_groupsAA.columns = ['experiment_number', 'users_number']
experiment_groupsAA['ratio']= (experiment_groupsAA['users_number']
.div(experiment_groupsAA['users_number'].shift(1))
.round(3)
)
experiment_groupsAA
Number of users differ slightly more than by 1%, thus we will pick up 2484 users randomly with sample() method.
import random
users_247_raw_ID = df_complete[df_complete.experiment_number==247]['user_id'].unique().tolist()
users_247_ID = random.sample(users_247_raw_ID, k=2484)
display(users_247_ID) #we got list of 2484 unique user IDs for group 247
#ckeck
len(users_247_raw_ID)
#ckeck up
len(users_247_ID)
#setting a df with visitors from group 247 equal to group 246
#users_247 = df_complete.query('user_id == @users_247_ID')
users_247 = df_complete[df_complete['user_id'].isin(users_247_ID)]
users_247.sample()
users_246 = df_complete.query('experiment_number == 246')
display(users_246.sample())
#combining the 2 data sets: joining the two tables with 246 experiment visitors and 247 experiment visitors together
frames = [users_246, users_247]
users_246_247 = pd.concat(frames)
users_246_247.head()
#groping events by experiment groups:
events_246_247 = users_246_247.pivot_table(index='event_name', values='user_id', columns='experiment_number', aggfunc=lambda x: x.nunique()).reset_index()
events_246_247
Now we are going to check if the groups 246 and 247 have any statistically significant difference.
First we are going to run A/A test for groups 246 and 247 to check if the data for A/B test is collected in a correct way.
We are going to run a proportion z-test for the analysis as the requierments for it are met:
#testing proportions
def check_hypothesis(group1,group2, event, alpha):
successes1=events_246_247[events_246_247.event_name==event][group1].iloc[0]
successes2=events_246_247[events_246_247.event_name==event][group2].iloc[0]
#for trials we will go back to previous df with pre-aggregated data
trials1=users_246_247[users_246_247.experiment_number==group1]['user_id'].nunique()
trials2=users_246_247[users_246_247.experiment_number==group2]['user_id'].nunique()
print( "Success for group", group1, "is",successes1, "for event", event,"\n",
"Success for group", group2, "is",successes2, "for event", event,"\n",
"Trials for group", group1, "is",trials1,"\n",
"Trials for group", group2, "is",trials2,"\n")
The most popular event is MainScreenAppear, we will run the function check_hypothesis for this event first.
As this as an A/A test we are setting alpha to 1%. One of the criteria for a successful A/A test is as follows: "None of the key metrics varies by a statistically significant amount—typically by no more than 1%".
check_hypothesis(246,247, 'MainScreenAppear', alpha=0.01)
Stating hypothesis.
The null hypothesis (H0) for the test: the proportions are the same.
The alternate hypothesis (H1): the proportions are not the same.
The whole function:
#testing hypothesis:
def check_hypothesis(group1,group2, event, alpha):
#let's start with successes, using
successes1=events_246_247[events_246_247.event_name==event][group1].iloc[0]
successes2=events_246_247[events_246_247.event_name==event][group2].iloc[0]
#for trials we can go back to previous df with pre-aggregated data
trials1=users_246_247[users_246_247.experiment_number==group1]['user_id'].nunique()
trials2=users_246_247[users_246_247.experiment_number==group2]['user_id'].nunique()
#proportion for success in the first group
p1 = successes1/trials1
#proportion for success in the second group
p2 = successes2/trials2
# proportion in a combined dataset
p_combined = (successes1 + successes2) / (trials1 + trials2)
difference = p1 - p2
z_value = difference / math.sqrt(p_combined * (1 - p_combined) * (1/trials1 + 1/trials2))
distr = st.norm(0, 1)
p_value = (1 - distr.cdf(abs(z_value))) * 2
print('p-value: ', p_value)
if (p_value < alpha):
print("Reject H0 for",event, 'and groups',group1, 'and' ,group2)
else:
print("Fail to Reject H0 for", event,'and groups',group1,'and' ,group2)
Now we are going to run the function for the two groups comparing all the events.
for i in events_246_247.event_name.unique():
check_hypothesis(246,247, i, alpha=0.01)
The data does not provide sufficient evidence, given the significance level we selected (1%), to reject the null hypothesis. Therefore, we can conclude that average number of users performed every event does not differ (μ1 = μ2).
There is no statistically significant difference in results for A/A test (experiment groups 246 and 247). Thus we can prove that:
Our samples size is big enough so we can run both t-test and z-test. Thus firstly we are going to use z-test and then check the calculations with t-test.
Now we are going to form a general table for all 3 experiments with equal number of users.
#picking up the same number of user IDs as for the previous groups (2484)
users_248_raw_ID = df_complete[df_complete.experiment_number==248]['user_id'].unique().tolist()
users_248_ID = random.sample(users_248_raw_ID, k=2484)
#setting a df with requered visitors IDs (group 248 = group 247 =group 246)
users_248 = df_complete[df_complete['user_id'].isin(users_248_ID)]
users_248.sample()
#combining the 2 data sets: joining the two tables with all 3 experimrnt together (246, 247 and 248):
frames = [users_246_247, users_248]
users_246_247_248 = pd.concat(frames)
users_246_247_248.head()
#groping events by experiment groups:
events_246_247_248 = users_246_247_248.pivot_table(index='event_name', values='user_id', columns='experiment_number', aggfunc=lambda x: x.nunique()).reset_index()
events_246_247_248
We will use the same function to test hypothesis for A/B testing. As we are going to run 3 comparisons (246/247, 246/248, 248/247) we will alter the sinificance level respectively, using the Šidák method. Normally Alpha is set to 0.5 as a commonly accepted level in industry.
Calculating alpha with The Šidák method (guarantees family-wise error rate < ɑ):
the required significance level will be found as follows:
alpha_sidak = 1 - (1 - 0.05)**(1/3) #we have 3 groups which gives us 3 combinations for comparison (246/247, 246/248, 247/248)
round(alpha_sidak,4)
Stating hypothesis (H0 states unchanged result):
H0: There is no difference between the groups A and B in number of users who performed the actions/ events
H1: There is a significant difference between the groups A and B in number of users who performed the actions/ events
def check_hypothesis(group1,group2, event, alpha):
#let's start with successes, using
successes1=events_246_247_248[events_246_247_248.event_name==event][group1].iloc[0]
successes2=events_246_247_248[events_246_247_248.event_name==event][group2].iloc[0]
#for trials we can go back to previous df with pre-aggregated data
trials1=users_246_247_248[users_246_247_248.experiment_number==group1]['user_id'].nunique()
trials2=users_246_247_248[users_246_247_248.experiment_number==group2]['user_id'].nunique()
#proportion for success in the first group
p1 = successes1/trials1
#proportion for success in the second group
p2 = successes2/trials2
# proportion in a combined dataset
p_combined = (successes1 + successes2) / (trials1 + trials2)
difference = p1 - p2
z_value = difference / math.sqrt(p_combined * (1 - p_combined) * (1/trials1 + 1/trials2))
distr = st.norm(0, 1)
p_value = (1 - distr.cdf(abs(z_value))) * 2
print('p-value: ', p_value)
if (p_value < alpha):
print("Reject H0 for",event, 'and groups',group1, 'and' ,group2)
else:
print("Fail to Reject H0 for", event,'and groups',group1,'and' ,group2)
#Comparison of events for 247/248
for i in events_246_247_248.event_name.unique():
check_hypothesis(247,248, i, alpha_sidak)
#Comparison of events for 246/248
for i in events_246_247_248.event_name.unique():
check_hypothesis(246,248, i, alpha_sidak)
#Comparison of events for 246/247 (with changed significance level to alpha_sidak)
for i in events_246_247_248.event_name.unique():
check_hypothesis(246,247, i, alpha_sidak)
The data does not provide sufficient evidence, given the significance level we selected (1.7%), to reject the null hypothesis. Therefore, we can conclude that average number of users performed each of the event in all 3 groups (246, 247, 248) does not differ (μ1 = μ2 = μ3) at any stage of the funnel. Changing fonts does not have any influence on the users.
Now we are going to test the difference for every group in general to prove the previous calculations with t-test.
#forming a table
samples = users_246_247_248.pivot_table(index='event_date', values='user_id', columns='experiment_number', aggfunc=lambda x: x.nunique()).reset_index()
samples.columns = ['date', '246', '247', '248']
samples
Creating varables with the samples for each group.
sample_246 = samples.loc[:, samples.columns == '246'].values.tolist()
sample_246 = [i[0] for i in sample_246]
sample_246
results = st.shapiro(sample_246)
results
sample_247 = samples.loc[:, samples.columns == '247'].values.tolist()
sample_247 = [i[0] for i in sample_247]
sample_247
sample_248 = samples.loc[:, samples.columns == '248'].values.tolist()
sample_248 = [i[0] for i in sample_248]
sample_248
Next we are going to define Variance for the 3 samples to define whether or not we can consider them equal for t-test.
variance_246 = np.var(sample_246)
print('Variance for 246 sample is ', variance_246.round(1))
variance_248 = np.var(sample_248)
print('Variance for 248 sample is ', variance_248.round(1))
variance_247 = np.var(sample_247)
print('Variance for 247 sample is ', variance_247.round(1))
The variances of the samples are not equal.
As we have samples of continuous data, the samples are drawn from a normally distributed data with close values of variances we will conduct an unpaired t-test to test the stated hypothesis (in Python it will be unpaired two-tailed T-test with 'equal_var = True').
Stating hypothesis (H0 states unchanged result):
H0: There is no difference between the two groups in number of users who performed the actions/ events
H1: There is a statistically significant difference between the two groups in number of users who performed the actions/ events
# "equal_var = False" as previous calculations proved that the 3 samples have different variance value.
results_246_247 = st.ttest_ind(
sample_246,
sample_247, equal_var = False)
results_247_248 = st.ttest_ind(
sample_247,
sample_248, equal_var = False)
results_246_248 = st.ttest_ind(
sample_246,
sample_248, equal_var = False)
#we are going to use alpha_sidak here as there were 3 comparisons made
print('p-value to compare the groups 246 and 247: ', results_246_247.pvalue)
print('p-value to compare the groups 247 and 248: ', results_247_248.pvalue)
print('p-value to compare the groups 246 and 248: ', results_246_248.pvalue)
if (results_246_247.pvalue < alpha_sidak):
print("Null hypothesis rejected for groups 246 and 247")
else:
print("Null hypothesis not rejected for groups 246 and 247")
if (results_247_248.pvalue < alpha_sidak):
print("Null hypothesis rejected for groups 247 and 248")
else:
print("Null hypothesis not rejected for groups 247 and 248")
if (results_246_248.pvalue < alpha_sidak):
print("Null hypothesis rejected for groups 246 and 248")
else:
print("Null hypothesis not rejected for groups 246 and 248")
The data does not provide sufficient evidence, given the significance level we selected (1.7%), to reject the null hypothesis. Therefore, we can conclude that average number of users performed each of the event in all 3 groups (246, 247, 248) does not differ (μ1 = μ2 = μ3).
All the tests (proportions z-test and t-test) run for 3 groups of experiment, with comparison of events and without spitting by events, prove that there is no statistically significant difference between the groups. Changing fonts have no impact on users.
In course of data preprocessing we studied data with 244126 entries; 5 event types and 3 experimet groups for 7551 unique users.
We have:
There are 5 types of events in the data set: almost half of the logs was on the MainScreen and the least popular type is Tutorial with only 0.4% of logs.
Number events per user vary greatly, from 1 to 2307; middle 50% of users had from 9 to 37 events; the average number of events per user is 20.
There are logs from 2019-07-25 to 2019-08-07 in the data set, but study of data distribution shows that the logs data is incomplete till 2019-07-31 inclusively. Thus we had to drop the data on logs till 01/08/2019 having lost 16 % of entries (2826 entries) and 19% of unique users.
Some technical issues with users' logs could account for that.
The filtered data still has users from all three experimental groups.
There are 4 events forming the funnel, the most frequent one is MainScreen(49% of logs) , it is followd by OfferScreen and CartScreen(19% and 18%), PaymentScreen(14%). The least popular event (Tutorial event with 0.4% of logs) is not a necessary step of the funnel and we do not include it in funnel calculations.
There are only 466 unique users that went through all of the events ( 6.2% of all users). If we omite Tutorial event as the least popular one, the number of users that performed all the events will be much higher, 3429 users.
As to events that have been performed at least once MainScreen is the leader, 37% of users viewd this screen at least once. Then go the Offer, Payment and Cart events with almost equal shares (18-23%) and Tutorial event is the least popular with only 4% of users.
84% of all users that got to the Tutorial screen viewed it once. That means that only 16% of users came back to it.
To the contrary, only 6% of users that got to the MainScreen never came back, all the others (94%) viewed this screen more than once. OfferScreen, PaymentScreen and CartScreen were viewed once by 9-10% of all users, thus 90-91% of users came back to this screens more than once.
The funnel is calculated for the following sequence of events MainScreenAppear → OffersScreenAppear → CartScreenAppear → PaymentScreenSuccessful as Tutorial obviously is not part of the main funnel.
We have got the following numbers: MainScreen visitors: 7419 and paid: 454. This way we got 6% conversion rate for this funnel. Other users might get converted going through another steps, because general number of users that got to Payment page is higher (3539 unique users).
The biggest share of users got lost on the stage of payment: only 26% of users that put the products to the cart eventually pay for them. This stage should be studed more thoroughly.
We can suggest the following possible issues to improve:
We have run an A/A/B test on 3 groups. Firstly we have proved that the test was designed correctly and there is no statistically significant difference between A/A groups (246 and 247).
Then we ran the same test comparing group 248 with 246 and 247. We randomly picked up the same sample size for each group and corrected significance level with Sidak method as we run multiple comparisons.
All tests proved that with given significance level we cannot prove that there is a difference between the tested groups. The samples did not provide sufficient evidence to conclude that the effect from altering fonts exists.