In cource of the project we are going to analyse a list of hypotheses and results of A/B testing.

In the Part 1 of the project we will range the hypothesis that may help boost revenue of a big online store. To do this we will look into data that is stored in the "hypotheses" table and set priority to every hypothesis using different appoaches (ICE and RICE).

Next in the Part 2 of the Project we are going to analyse A/B test results that were collected duting 1-month experiment. We are going to look into the data stored in 2 tables: orders_us and visits_us. For the purpose of analysis we are going to split the collected data in 2 equal groups and calculate statistical significane of the difference in orders size and in conversion rates. This way we are going to check if the difference in the key metrics of these groups is due to random chance or it is due to the treatment.

In the end we are going to make conclusions if the experiment was a success, should we continue collecting data or should we stop it.

Table of contents:¶

Part 1. Prioritizing Hypotheses
Part 2. A/B Test Analysis
Data Study
1. Cumulative revenue by group
2. Cumulative average order size by group
3. Relative difference in cumulative average order size for group B compared with group A
4. Conversion rate
5. Scatter chart of the number of orders per user
6. 95th and 99th percentiles for the number of orders per user
7. Scatter chart of order prices
8. 95th and 99th percentiles of order prices
9. Statistical significance of the difference in conversion between the groups using the raw data
10. Statistical significance of the difference in average order size between the groups using the raw data
11. Statistical significance of the difference in conversion between the groups using the filtered data
12. Statistical significance of the difference in average order size between the groups using the filtered data
13. Decision based on the test results

import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats as st
import numpy
import seaborn as sns
from IPython.display import display

hypotheses = pd.read_csv('./hypotheses_us.csv', sep=';')

Part 1. Prioritizing Hypotheses ¶

Data study¶

display(hypotheses)

#lowercase to all column names of hypotheses df
hypotheses.columns = map(str.lower, hypotheses.columns)

hypotheses.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   hypothesis  9 non-null      object
 1   reach       9 non-null      int64 
 2   impact      9 non-null      int64 
 3   confidence  9 non-null      int64 
 4   effort      9 non-null      int64 
dtypes: int64(4), object(1)
memory usage: 388.0+ bytes

hypotheses['hypothesis'].value_counts()

Change the background color on the main page. This will increase user engagement                                          1
Launch your own delivery service. This will shorten delivery time                                                         1
Change the category structure. This will increase conversion since users will find the products they want more quickly    1
Add product recommendation blocks to the store's site. This will increase conversion and average purchase size            1
Add two new channels for attracting traffic. This will bring 30% more users                                               1
Add a subscription form to all the main pages. This will help you compile a mailing list                                  1
Launch a promotion that gives users discounts on their birthdays                                                          1
Add a customer review page. This will increase the number of orders                                                       1
Show banners with current offers and sales on the main page. This will boost conversion                                   1
Name: hypothesis, dtype: int64

Hypotheses ranging¶

ICE framework¶

We are going to calculate ICE score the following way: (impact * confidence)/effort

#adding the ICE column to the table:
hypotheses['ICE'] = (hypotheses['impact']*hypotheses['confidence'])/hypotheses['effort']

#printing the result:
print(hypotheses[['hypothesis','ICE']].sort_values(by='ICE', ascending=False))

                                          hypothesis        ICE
8  Launch a promotion that gives users discounts ...  16.200000
0  Add two new channels for attracting traffic. T...  13.333333
7  Add a subscription form to all the main pages....  11.200000
6  Show banners with current offers and sales on ...   8.000000
2  Add product recommendation blocks to the store...   7.000000
1  Launch your own delivery service. This will sh...   2.000000
5  Add a customer review page. This will increase...   1.333333
3  Change the category structure. This will incre...   1.125000
4  Change the background color on the main page. ...   1.000000

According to the ICE framework the most highly ranked hypothesis are no. 8 and 0: "Launch a promotion that gives users discounts on their birthdays" and "Add two new channels for attracting traffic. This will bring 30% more users".

RICE framework¶

This method is helpful if it’s important to understand how many of our customers a given feature will benefit. We are going to calculate RICE score the following way:

RICE score = (reachimpactconfidence)/effort

#adding the RICE column to the table:
hypotheses['RICE'] = (hypotheses['reach']*hypotheses['impact']*hypotheses['confidence'])/hypotheses['effort']

#printing the result:
print(hypotheses[['hypothesis','RICE']].sort_values(by='RICE', ascending=False))

                                          hypothesis   RICE
7  Add a subscription form to all the main pages....  112.0
2  Add product recommendation blocks to the store...   56.0
0  Add two new channels for attracting traffic. T...   40.0
6  Show banners with current offers and sales on ...   40.0
8  Launch a promotion that gives users discounts ...   16.2
3  Change the category structure. This will incre...    9.0
1  Launch your own delivery service. This will sh...    4.0
5  Add a customer review page. This will increase...    4.0
4  Change the background color on the main page. ...    3.0

Taking into account reach factor has changed the ranking dramatically. Now the top priority hypothesis are no. 7 and 2: "Add a subscription form to all the main pages. This will help you compile a mailing list" and "Add product recommendation blocks to the store's site. This will increase conversion and average purchase size".

#heat map for the ranged hypothesis:
hypotheses_ranged = hypotheses.drop(['hypothesis','reach', 'impact', 'confidence', 'effort'], axis=1)
sns.set(style='white')
plt.figure(figsize=(20, 3))
plt.title('Ranged hypotheses', fontsize =16)
sns.heatmap(hypotheses_ranged.T, annot=True, fmt='g', linewidths=1, linecolor='grey');

Hypotheses ranging looks different for the two frameworks. If we want to take into consideration as many factors as possible we should start testing the RICE top ranked hypotheses: no. 7, 2, 0, 6. This way we will prioritize the hypotheses that are going to reach the maximum number of customers.

Part 2. A/B Test Analysis ¶

Data study ¶

#adjustments of the paths for the Practicum platform:
#orders = pd.read_csv('/datasets/orders_us.csv', sep=',')
#visits = pd.read_csv('/datasets/visits_us.csv', sep=',')
orders = pd.read_csv('./orders_us.csv', sep=',')
visits = pd.read_csv('./visits_us.csv', sep=',')

display(orders)

orders.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1197 entries, 0 to 1196
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   transactionId  1197 non-null   int64  
 1   visitorId      1197 non-null   int64  
 2   date           1197 non-null   object 
 3   revenue        1197 non-null   float64
 4   group          1197 non-null   object 
dtypes: float64(1), int64(2), object(2)
memory usage: 37.5+ KB

#changing the date type 
orders['date'] = pd.to_datetime(orders['date'], format="%Y.%m.%d")

#checking if there are visitors that got to both groups, A and B

visitors_groups = orders.groupby('visitorId', as_index=False).agg({'group' : pd.Series.nunique})
visitors_duplicated = visitors_groups.query('group>1')
visitors_duplicated['visitorId'].count()

58

We have 58 visitors that got to both groups. We will exclude them from our db (as according to the A/B testing rules users can’t take a part in 2 groups) and then we will form equal groups A and B with random sample method.

#excluding visitors that got to 2 groups:

orders_filtered = orders.query('visitorId not in @visitors_duplicated.visitorId')
display(orders_filtered)

visitorsA = orders_filtered.query('group=="A"')
display(visitorsA)

visitorsA['visitorId'].nunique()

445

visitorsB_raw = orders_filtered.query('group=="B"')
display(visitorsB_raw)

visitorsB_raw['visitorId'].nunique()

528

As group B bigger than group A(445 vs 528 unique visitors), we will randomly pick up from this initial group the same number of visitors as in group A (=445).

import random
visitorsB_raw_ID = visitorsB_raw['visitorId'].values.tolist()
visitorsB_ID = random.sample(visitorsB_raw_ID, k=445)

display(visitorsB_ID) #we got list of 445 unique visitors IDs for group B

[1221478247,
 1075551409,
 3647477940,
 2366521417,
 1918801743,
 4009017467,
 2025197455,
 930356060,
 1766557988,
 1964413493,
 931669455,
 3103718471,
 619764962,
 1404560065,
 1588775363,
 3931967268,
 2830871062,
 1627549308,
 3062077384,
 1476741193,
 3097512534,
 2570444314,
 3577042875,
 2914239125,
 1967724198,
 3940815455,
 927791387,
 1605520752,
 258610743,
 1310081088,
 2819363575,
 509080881,
 2953427180,
 2139465532,
 249864742,
 3733762160,
 266115224,
 3584855583,
 3077396270,
 4258103007,
 1700445283,
 3661375494,
 2406916810,
 4224378620,
 3647934575,
 1703949673,
 228221722,
 2614516882,
 3953983768,
 1786280350,
 3741503045,
 582460750,
 2531836666,
 2146711261,
 2868887864,
 2582702340,
 1151477821,
 1387624577,
 1455323168,
 2626290177,
 2279926127,
 4037043590,
 3499733621,
 3288820196,
 954785863,
 2572275626,
 2245562049,
 2230225652,
 3309365614,
 3077396270,
 3319676739,
 3254689071,
 3455264956,
 414545763,
 4062713576,
 2201480904,
 1579036099,
 1782658204,
 2397025226,
 2025197455,
 1253818784,
 680119828,
 1630618580,
 1826431635,
 544963298,
 2805132315,
 916684595,
 4231233292,
 2273809679,
 4279090005,
 3042692891,
 1253370069,
 4186744110,
 3261022559,
 683753316,
 1692957755,
 102981183,
 2853587382,
 4157461730,
 2606908679,
 3942804483,
 3218863961,
 1060341621,
 2944179391,
 3362039251,
 1831227158,
 3085290658,
 407706407,
 3362039251,
 201615765,
 2328378067,
 1010866419,
 3994425631,
 157579733,
 4134830387,
 666694713,
 78678012,
 1557781683,
 1985475298,
 2213855729,
 3371277047,
 2038023866,
 1388838483,
 2307676173,
 3427397479,
 2710492161,
 1341752722,
 1795951456,
 1263373242,
 3736558232,
 3185203440,
 3074968175,
 39475350,
 2580397131,
 1501480579,
 3423937755,
 3245907439,
 3907107900,
 978473912,
 2987864213,
 2128962954,
 2968526225,
 1713857373,
 284094220,
 3357398384,
 280073872,
 2524310328,
 1404560065,
 3304571422,
 1395236619,
 938079215,
 1354599463,
 4100747010,
 2370420350,
 3131127734,
 1225835168,
 1606766439,
 3815257471,
 4134830387,
 3600436937,
 2019355024,
 3770394574,
 4216089391,
 4018812886,
 470091460,
 3785537730,
 3860669668,
 3836897869,
 634450378,
 2461709734,
 1571986409,
 1807547430,
 962484204,
 3752369942,
 2973109265,
 2044161277,
 854695158,
 243620413,
 643335272,
 3866069709,
 935554773,
 3391919803,
 2846507753,
 3342201840,
 816783716,
 943674707,
 1172166732,
 2742574263,
 400358405,
 2742574263,
 522292794,
 2223522147,
 2316833558,
 1562205279,
 3935952912,
 2870367609,
 447836349,
 3909304777,
 586100578,
 104602633,
 1728743776,
 2669052850,
 433951253,
 3573155649,
 2928315504,
 3372131337,
 1667360450,
 410113592,
 3577713868,
 3463388890,
 2900797465,
 264456202,
 530257481,
 2738601405,
 3977396102,
 2108080724,
 1379151787,
 3704174715,
 4012829377,
 642826102,
 2412366830,
 2552584164,
 2785527055,
 2421265884,
 3724732821,
 3141507731,
 2395035985,
 1985475298,
 2999642019,
 1554533732,
 4169666863,
 1577719542,
 114507418,
 406208401,
 2406897935,
 3893674369,
 1494664757,
 2278620397,
 4027395615,
 2663148288,
 1637811943,
 3312258926,
 3686264987,
 856264494,
 290022471,
 872679699,
 1741803981,
 48147722,
 535857429,
 2604694259,
 259989552,
 3906435972,
 4251115836,
 710690947,
 723038324,
 4003628586,
 2413071759,
 3239199975,
 4194288377,
 406208401,
 728063420,
 5114589,
 3511617963,
 1110007955,
 1377447894,
 2903830376,
 1826088570,
 2849720983,
 1097629811,
 1858007013,
 3713138051,
 3294452564,
 625115313,
 458253749,
 368204224,
 2361350105,
 1196621759,
 1738775963,
 318152663,
 3590655705,
 2010516786,
 993094131,
 2634105098,
 862992587,
 3910723394,
 3508047169,
 154576532,
 990849616,
 2941133924,
 2261102418,
 2986402114,
 3332057097,
 2908466434,
 1100237501,
 1612349123,
 2366521417,
 2096506107,
 3941503500,
 3260527149,
 3812442168,
 689129267,
 1335154194,
 2210805728,
 3704741858,
 4251087345,
 3591136796,
 716514914,
 1634934899,
 2091902649,
 2115666868,
 702994395,
 2452603128,
 608641596,
 3636663279,
 3187581794,
 3927376365,
 4166660304,
 544632063,
 600572117,
 2328791559,
 1421513677,
 1206072200,
 2999778091,
 233773277,
 2378092656,
 2399568665,
 522292794,
 1611907877,
 2604012846,
 2075769885,
 152397976,
 1197745035,
 2036584322,
 2763027084,
 3118156181,
 3870317023,
 2042457747,
 2617784228,
 222718357,
 1016890234,
 1964799739,
 1471458381,
 1373713654,
 2138109167,
 3170813781,
 3170292896,
 621427706,
 2247745159,
 2819072957,
 1920142716,
 1667068825,
 930758570,
 2536252163,
 2742574263,
 3873846164,
 1967247027,
 4033511429,
 3642806036,
 3255082731,
 56960363,
 91069833,
 325179089,
 2578486380,
 570540961,
 1197773061,
 654930236,
 2973964255,
 1596970166,
 249864742,
 309675944,
 1722729964,
 3379260072,
 171459513,
 1995481842,
 1420364840,
 3722970464,
 1882260405,
 451104171,
 656974993,
 2162167602,
 2468974761,
 2913441496,
 427004138,
 2700013943,
 162423694,
 172218683,
 671625855,
 3310105714,
 567175747,
 1111826046,
 2295217812,
 2758621772,
 2397843657,
 4006260374,
 707833925,
 1768898981,
 650280576,
 3367493247,
 282188993,
 1147554328,
 368679516,
 1802957007,
 3917749921,
 2467439671,
 3332057097,
 888512513,
 1035424004,
 179951984,
 683554837,
 1397211727,
 2476105225,
 3419266314,
 835577924,
 4181076848,
 2488182306,
 47206413,
 3595021912,
 2096506107,
 471842048,
 518847692,
 1333058479,
 3524303171,
 3024988998,
 2396134342,
 835561402,
 856349446,
 1117202045,
 2373765141,
 2171487014,
 1217561296,
 2873491771,
 1690383497,
 1893387602,
 850320739,
 583433650,
 3791892006,
 1274716998,
 3735525502,
 4228915420,
 3077470899,
 363280180,
 600884435,
 2156256996,
 1261629107,
 1215602524]

#setting a df with visitors from group B equal to group A

visitorsB = visitorsB_raw.query('visitorId == @visitorsB_ID')
display(visitorsB)

#joining the two tables with A visitors and B visitors together
frames = [visitorsA, visitorsB]
orders_filtered = pd.concat(frames)
orders_filtered

#moving on to "visits" df:
display(visits)

visits.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 62 entries, 0 to 61
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   date    62 non-null     object
 1   group   62 non-null     object
 2   visits  62 non-null     int64 
dtypes: int64(1), object(2)
memory usage: 1.0+ KB

#changing the date type 
visits['date'] = pd.to_datetime(visits['date'], format="%Y.%m.%d")

Results study ¶

1. Cumulative revenue by group ¶

We are going to graph cumulative revenue by group and make conclusions.

# building an array with unique paired date-group values 
datesGroups = orders_filtered[['date','group']].drop_duplicates()
display(datesGroups)

# getting aggregated cumulative daily data on orders revenue
ordersAggregated = datesGroups.apply(lambda x: orders_filtered[np.logical_and(orders_filtered['date'] <= x['date'], orders_filtered['group'] == x['group'])].agg({'date' : 'max', 'group' : 'max', 'transactionId' : pd.Series.nunique, 'visitorId' : pd.Series.nunique, 'revenue' : 'sum'}), axis=1).sort_values(by=['date','group'])
ordersAggregated.head()

# getting aggregated cumulative daily data on visits  
#the number of distinct visits in the test group up to the specified date, inclusive
visitsAggregated = datesGroups.apply(lambda x: visits[np.logical_and(visits['date'] <= x['date'], visits['group'] == x['group'])].agg({'date' : 'max', 'group' : 'max', 'visits' : 'sum'}), axis=1).sort_values(by=['date','group'])
visitsAggregated.head()

# merging the two tables into one and giving its columns descriptive names
cumulativeData = ordersAggregated.merge(visitsAggregated, left_on=['date', 'group'], right_on=['date', 'group'])
cumulativeData.columns = ['date', 'group', 'orders', 'buyers', 'revenue', 'visits']

print(cumulativeData.head(5))

        date group  orders  buyers  revenue  visits
0 2019-08-01     A      23      19   2266.6     719
1 2019-08-01     B      12      12    650.6     713
2 2019-08-02     A      42      36   3734.9    1338
3 2019-08-02     B      31      30   2772.6    1294
4 2019-08-03     A      66      60   5550.1    1845

We have received cumulative data on revenue per day per every group. Now we are goinug to visualize it for further analysis.

# DataFrame with cumulative orders and cumulative revenue by day, group A
cumulativeRevenueA = cumulativeData[cumulativeData['group']=='A'][['date','revenue', 'orders']]

# DataFrame with cumulative orders and cumulative revenue by day, group B
cumulativeRevenueB = cumulativeData[cumulativeData['group']=='B'][['date','revenue', 'orders']]

# Plotting the group A revenue graph 
plt.figure(figsize=(20,10))
plt.plot(cumulativeRevenueA['date'], cumulativeRevenueA['revenue'], label='A')

# Plotting the group B revenue graph 
plt.plot(cumulativeRevenueB['date'], cumulativeRevenueB['revenue'], label='B')

plt.xticks(rotation=90)
plt.title('Cumulative Revenue Graph for A and B groups', fontsize=18)
plt.xlabel('Time period', fontsize=12)
plt.ylabel('Revenue Amount', fontsize=12)
plt.grid()
plt.legend() 
plt.show()

From the start of experiment cumulative revenue for both groups slightly fluctuated around the same fugures but approximately after 18/08/2019 group B started loosing their positions and group A became the leader.

Till 18.08.2019 cumulative revenue for both groups slightly fluctuated around the same fugures and were almost equal. After 18.08.2019 the revenue of the group A rose significantly. It might have been caused by some big order or some non-recurrent expencive purchase made in this group. We will plot another graph to investigate the situation.

2. Cumulative average order size by group ¶

#We will plot average purchase size by groups. We'll divide cumulative revenue by the cumulative number of orders:
plt.figure(figsize=(20,10))
plt.plot(cumulativeRevenueA['date'], cumulativeRevenueA['revenue']/cumulativeRevenueA['orders'], label='A')
plt.plot(cumulativeRevenueB['date'], cumulativeRevenueB['revenue']/cumulativeRevenueB['orders'], label='B')

plt.xticks(rotation=90)

plt.title('Average Purchase Size for A and B groups', fontsize=18)
plt.xlabel('Time period', fontsize=12)
plt.ylabel('Purchase Size', fontsize=12)
plt.grid()
plt.legend() 
plt.show()

Average purchase size of group B exceeded the one of group A from the very beginning of the experiment. Values for both groups had strong fluctuations during the first half of experiment; the fluctuations eased after 15/08/2019. After this date cumulative average order size for both groups stabilized around the same value (105-110). But bu the end of experiment group A slightly surpassed average purchase size of group B. Sharp peaks in the charts can be explained by some expencive or large orders. We will investigate them later.

3. Relative difference in cumulative average order size for group B compared with group A. ¶

We will plot a relative difference graph for the average purchase sizes. To do this we will form a new data frame with cumulative revenue for both groups merging two tables. Relative difference will be calculated as ratio between cumulative average order size of group B and group A.

We'll add a horizontal axis where ratio equals to zero (values of both groups are equal).

pic 1¶

# gathering the data into one DataFrame
mergedCumulativeRevenue = cumulativeRevenueA.merge(cumulativeRevenueB, left_on='date', right_on='date', how='left', suffixes=['A', 'B'])

# plotting a relative difference graph for the average purchase sizes
plt.figure(figsize=(20,10))
plt.plot(mergedCumulativeRevenue['date'], (mergedCumulativeRevenue['revenueB']/mergedCumulativeRevenue['ordersB'])/(mergedCumulativeRevenue['revenueA']/mergedCumulativeRevenue['ordersA'])-1)
plt.xticks(rotation=90)
plt.grid()
# adding the X axis
plt.axhline(y=0, color='black', linestyle=':')

plt.title('Relative Difference graph for the average purchase sizes for A and B groups', fontsize=18)
plt.xlabel('Time period', fontsize=12)
plt.ylabel('Relative Purchase Size', fontsize=12)

plt.show()

Average purchase size of group B started to exceed group A after 03/08/2019 reaching its peak on 7/08/2019, then it lowered for several days from 12/082019 till 15/08/2019. After that date the relative difference stabilized around 0 anf slightly dropped after 27/08/2019. During the 2nd half of experiment the two groups did nit show much difference in average purchase size.

4. Conversion rate per group ¶

We will calculate conversion rate as the ratio of orders to the number of visits for each day. Next we will plot it.

#checking the date range:
mergedCumulativeRevenue['date'].describe()

count                      31
unique                     31
top       2019-08-02 00:00:00
freq                        1
first     2019-08-01 00:00:00
last      2019-08-31 00:00:00
Name: date, dtype: object

We will calculate cumulative conversion and then a relative difference for the cumulative conversion rates

#calculating cumulative conversion as the ratio of orders to the number of visits for each day
cumulativeData['conversion'] = cumulativeData['orders']/cumulativeData['visits']

# selecting data on group A 
cumulativeDataA = cumulativeData[cumulativeData['group']=='A']

# selecting data on group B
cumulativeDataB = cumulativeData[cumulativeData['group']=='B']

# plotting the graphs
plt.figure(figsize=(20,10))
plt.plot(cumulativeDataA['date'], cumulativeDataA['conversion'], label='A')
plt.plot(cumulativeDataB['date'], cumulativeDataB['conversion'], label='B')
plt.legend()

plt.xticks(rotation=90)
#plt.axis(['2019-08-01', '2019-08-31', 0, 0.015])
plt.grid()


plt.title('Cumulative Conversion for A and B groups', fontsize=18)
plt.xlabel('Time period', fontsize=12)
plt.ylabel('Conversion rate', fontsize=12)
plt.show()

The chart demonstrates that group A had much better conversion rate at the start of experiment (0.032 vs 0.02) but then it dropped significatly and the difference was not so dramatic afterwards. Fluctuations in conversion of one group corresponded with fluctuations of the second group, they had peaks and troughs at the same dates. Fluctuations eased by the end of experiment for both groups, the difference in conversion rates was not big but stable.

pic 2¶

#plotting a relative difference graph for the cumulative conversion rates B/A:
mergedCumulativeConversions = cumulativeDataA[['date','conversion']].merge(cumulativeDataB[['date','conversion']], left_on='date', right_on='date', how='left', suffixes=['A', 'B'])
plt.figure(figsize=(20,10))
plt.plot(mergedCumulativeConversions['date'], mergedCumulativeConversions['conversionB']/mergedCumulativeConversions['conversionA']-1, label="Relative gain in conversion in group B as opposed to group A")
plt.legend()

plt.axhline(y=0, color='black', linestyle='--') 
#plt.axhline(y=-0.1, color='grey', linestyle='--') 
#plt.axis(["2019-08-01", '2019-08-31', -0.6, 0.6])

plt.xticks(rotation=90)
plt.grid()

plt.title('Relative difference graph for B/A groups', fontsize=18)
plt.xlabel('Time period', fontsize=12)
plt.ylabel('rate', fontsize=12)
plt.show()

#plotting a relative difference graph for the cumulative conversion rates A/B:
plt.figure(figsize=(20,10))
plt.plot(mergedCumulativeConversions['date'], mergedCumulativeConversions['conversionA']/mergedCumulativeConversions['conversionB']-1, label="Relative gain in conversion in group A as opposed to group B")
plt.legend()

plt.axhline(y=0, color='black', linestyle='--') 
#plt.axhline(y=-0.1, color='grey', linestyle='--') 
#plt.axis(["2019-08-01", '2019-08-31', -0.6, 0.6])

plt.xticks(rotation=90)
plt.grid()

plt.title('Relative difference graph for A/B groups', fontsize=18)
plt.xlabel('Time period', fontsize=12)
plt.ylabel('rate', fontsize=12)
plt.show()

These charts prove that conversion of group A surpasses the one of group B. Although the differece eased during the month of experiment it is still 5% higher for group A.

5. Number of orders per user ¶

#creating a table with data on users and orders
ordersByUsers = orders_filtered.drop(['group', 'revenue', 'date'], axis=1).groupby('visitorId', as_index=False).agg({'transactionId' : pd.Series.nunique})
ordersByUsers.columns = ['visitorId','orders']
print(ordersByUsers.sort_values(by='orders',ascending=False).head(10))

      visitorId  orders
820  3967698036       3
122   611059232       3
803  3908431265       3
424  2108163459       3
617  2988190573       3
44    249864742       3
564  2742574263       3
363  1772171838       2
846  4109358064       2
102   522292794       2

#plotting chart
x_values = pd.Series(range(0, len(ordersByUsers)))
plt.scatter(x_values, ordersByUsers['orders'])
plt.title('Scatter chart of orders per visitor', fontsize=18)

plt.ylabel('Number of orders', fontsize=12)

plt.grid()
plt.show()

Most of the visitors made 1 order, just few of them made 2 and only several made 3 orders. To define the share of outliers we will calculate percentiles below.

Now we will look at the number of orders per buyer for every group and it's dynamics.

#calculating cumulative conversion as the ratio of orders to the number of visits for each day
cumulativeData['orders_per_users'] = cumulativeData['orders']/cumulativeData['buyers']

# selecting data on group A 
cumulativeDataA = cumulativeData[cumulativeData['group']=='A']

# selecting data on group B
cumulativeDataB = cumulativeData[cumulativeData['group']=='B']

# plotting the scatter chart
plt.figure(figsize=(20,10))
x_values = pd.Series(range(0, len(cumulativeDataA['orders_per_users'])))
plt.scatter(x_values, cumulativeDataA['orders_per_users'], label='A')
plt.scatter(x_values, cumulativeDataB['orders_per_users'], label='B')
plt.title('Scatter chart of orders per user', fontsize=18)

plt.ylabel('Orders/users', fontsize=12)
plt.legend()

plt.grid()
plt.show()

At the beginning group A got 20% more orders per buyer; by the 23rd day of experiment the values stabilized around 1.05 orders per buyer for both groups.

6. 95th and 99th percentiles for the number of orders per user. Defining point of anomaly. ¶

#Calculate the 95th and 99th percentiles for the number of orders per user. 
np.percentile(ordersByUsers['orders'], [95, 99])

array([1., 2.])

#setting a variable for further calculations
too_many = int(np.percentile(ordersByUsers['orders'], [99]))
too_many

2

Not more than 5% of users placed more than 1 order, and not more than 1% of users made more than two orders. According to these calculations we can say that we have defined the point at which a data point becomes an anomaly: visitors with 3 orders per capita are outliers.

7. Plot a scatter chart of order prices ¶

#histogram of order price distribution
plt.hist(orders_filtered['revenue'], log=True) 
plt.title('order price distribution', fontsize=18)
plt.xlabel('Revenue per order', fontsize=12)
plt.grid()
plt.show()

We can see that most of the orders bring up to 1500. Still there are some outliers with revenue over 3000. To investigate it we will buld scatter plot.

#another chart to evaluate order prices:
x_values = pd.Series(range(0, len(orders_filtered['revenue'])))
plt.scatter(x_values, orders_filtered['revenue'])
plt.title('Scatter chart of revenue per order', fontsize=18)

plt.ylabel('Revenue', fontsize=12)

plt.grid()
plt.show()

The second chart proves that there are only several orders with revenue over 1000. They can be stated as outliers. In order to state the upper fence more precisely we will calculate the percentiles as well.

8. 95th and 99th percentiles of order prices. Point of anomaly. ¶

#Calculatinug the 95th and 99th percentiles for the order prices. 
np.percentile(orders_filtered['revenue'], [95, 99])

array([410.5, 830.3])

#setting a variable for further calculations
too_expencive = int(np.percentile(orders_filtered['revenue'], [99]))
too_expencive

830

Not more than 5% of orders costed over 401, and not more than 1% of orders brought revenue over 830.30. Thus we can state that the point after which an order becomes an anomaly expencive is 830.30. Consequently orders with revenue over 830.30 can be viewed as outliers.

9. Statistical significance of the difference in conversion between the groups using the raw data. ¶

We are going to calculate statistical significance of the difference in conversion between the groups using the Mann-Whitney U test because we have independent data samples with quantative variables (nonparametric version of the Student t-test), also the data doesn't follow normal distribution. We can see that the data is not normally distributed on a histogram here

We'll also find the relative difference in average order size between the groups:

#splitting tables with data on two groups:
ordersByUsersA = orders_filtered[orders_filtered['group']=='A'].groupby('visitorId', as_index=False).agg({'transactionId' : pd.Series.nunique})
ordersByUsersA.columns = ['visitorId', 'orders']

ordersByUsersB = orders_filtered[orders_filtered['group']=='B'].groupby('visitorId', as_index=False).agg({'transactionId' : pd.Series.nunique})
ordersByUsersB.columns = ['visitorId', 'orders']

#Combining the two tables with data on visits to form samples
sampleA = pd.concat([ordersByUsersA['orders'],pd.Series(0, index=np.arange(visits[visits['group']=='A']['visits'].sum() - len(ordersByUsersA['orders'])), name='orders')],axis=0)

sampleB = pd.concat([ordersByUsersB['orders'],pd.Series(0, index=np.arange(visits[visits['group']=='B']['visits'].sum() - len(ordersByUsersB['orders'])), name='orders')],axis=0)

Stating hypothesis:

H0: there's not a statistically significant difference in conversion between the groups

H1: average conversion level differs between the groups

A critical statistical significance level will be set at 0.05 as it is a commonly accepted one in the industry for non-multiple testing.

alpha = .05 
 
results =  st.mannwhitneyu(sampleA, sampleB)[1] 

print("{0:.3f}".format(results)) 
 
if (results< alpha):
    print("We reject the null hypothesis")
else:
    print("We can't reject the null hypothesis")

0.267
We can't reject the null hypothesis

We have calculeted the p-value by implification of mannwhitneyu() criterion.

We have received p-value greater than alfa thus with probability of 95% we cannot reject the null hypothesis and conclude that average conversion differs between the groups.

Given the significance level we selected (5%), we failed to reject the null hypothesis, the data does not provide sufficient evidence for it. Therefore, we cannot make conclusion that average order size for both groups is the same.

# Calculating the relative conversion gain for group B/A:

print("{0:.3f}".format(sampleB.mean()/sampleA.mean()-1)) #the relative loss of group B

-0.045

We can't reject the null hypothesis that there's not a statistically significant difference in conversion between the groups. The relative loss of group B is 4.3%.

10. Statistical significance of the difference in average order size between the groups using the raw data ¶

To calculate the statistical significance of the difference in the segments' average order size, we'll pass the data on revenue to the mannwhitneyu() criterion. We'll also find the relative difference in average order size between the groups.

Stating hypothesis:

H0: average order size between the groups is the same

H1: average order size differs between the groups

A critical statistical significance level will be set to 0.05 as it is a commonly accepted one in the industry for non-multiple testing.

alpha = .05 
 
results =  st.mannwhitneyu(orders_filtered[orders_filtered['group']=='A']['revenue'], orders_filtered[orders_filtered['group']=='B']['revenue'])[1] 

print("{0:.3f}".format(results)) 
 
if (results< alpha):
    print("We reject the null hypothesis")
else:
    print("We can't reject the null hypothesis")

0.158
We can't reject the null hypothesis

#Calculating the relative difference in average order size between the groups (B/A):
print("{0:.3f}".format(orders_filtered[orders_filtered['group']=='B']['revenue'].mean()/orders_filtered[orders_filtered['group']=='A']['revenue'].mean()-1))

0.320

The p-value is considerably higher than 0.05, so there's no reason to reject the null hypothesis and conclude that average order size differs between the groups.
Nonetheless, the average order size for group B is 3% smaller then the order it is for the group A.

11. Statistical significance of the difference in conversion between the groups using the filtered data ¶

In order to filter the data we will determine the outliers: IDs of the users that made too big or too expencive orders. Then we will filter them out from the raw data and calculate the statisical significance the same way as in pp.9.

# we will find the ubnormal users' IDs:
# usrs that maid too many orders
usersWithManyOrders = pd.concat([ordersByUsersA[ordersByUsersA['orders'] > 2]['visitorId'], ordersByUsersB[ordersByUsersB['orders'] > 2]['visitorId']], axis = 0)
#users that paid too much
usersWithExpensiveOrders = orders_filtered.query('revenue > @too_expencive')['visitorId']
#joining the two groups together
abnormalUsers = pd.concat([usersWithManyOrders, usersWithExpensiveOrders], axis = 0).drop_duplicates().sort_values()
print(abnormalUsers.head(5))
print(abnormalUsers.shape) #defining total number of anomalous users.

1099    148427295
22      249864742
58      611059232
949     887908475
744     888512513
Name: visitorId, dtype: int64
(19,)

We received a list of 18 user IDs that made either too many orders (more than 99% of other visitors) or placed too expencive orders (paid more than 99% of other visitors).

#filtering out the abnormal users from the raw data:

sampleAFiltered = pd.concat([ordersByUsersA[np.logical_not(ordersByUsersA['visitorId'].isin(abnormalUsers))]['orders'],pd.Series(0, index=np.arange(visits[visits['group']=='A']['visits'].sum() - len(ordersByUsersA['orders'])),name='orders')],axis=0)

sampleBFiltered = pd.concat([ordersByUsersB[np.logical_not(ordersByUsersB['visitorId'].isin(abnormalUsers))]['orders'],pd.Series(0, index=np.arange(visits[visits['group']=='B']['visits'].sum() - len(ordersByUsersB['orders'])),name='orders')],axis=0)

ordersByUsersA.head()

df_2=ordersByUsersA.head()

df_2.head()

sampleAFiltered_2 = pd.concat([df_2[np.logical_not(ordersByUsersA['visitorId'].isin(abnormalUsers))]['orders'],pd.Series(0, index=np.arange(visits[visits['group']=='A']['visits'].sum() - len(df_2['orders'])),name='orders')],axis=0)

<ipython-input-56-552f42d04b33>:1: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  sampleAFiltered_2 = pd.concat([df_2[np.logical_not(ordersByUsersA['visitorId'].isin(abnormalUsers))]['orders'],pd.Series(0, index=np.arange(visits[visits['group']=='A']['visits'].sum() - len(df_2['orders'])),name='orders')],axis=0)

sampleAFiltered_2.head(20)

0     1
1     1
2     1
3     1
4     1
0     0
1     0
2     0
3     0
4     0
5     0
6     0
7     0
8     0
9     0
10    0
11    0
12    0
13    0
14    0
Name: orders, dtype: int64

visits.head()

sampleAFiltered.head()

0    1
1    1
2    1
3    1
4    1
Name: orders, dtype: int64

sampleBFiltered

0        1
1        1
2        1
3        1
4        1
        ..
18480    0
18481    0
18482    0
18483    0
18484    0
Name: orders, Length: 18909, dtype: int64

Now we will apply the statistical Mann-Whitney criterion to the resulting samples.

Keeping the same hypothesis and statistical significance level (0.05):

H0: there's not a statistically significant difference in conversion between the groups

H1: average conversion level differs between the groups

alpha = .05 
 
results =  st.mannwhitneyu(sampleAFiltered, sampleBFiltered)[1] 

print("{0:.3f}".format(results)) 
 
if (results< alpha):
    print("We reject the null hypothesis")
else:
    print("We can't reject the null hypothesis")

0.325
We can't reject the null hypothesis

We got result that is very close to the previous one. For the data with filtered out outliers we still failed to reject the null hypothesis with the significance level of 5%.

Therefore, we cannot make conclusion that conversion level for both groups is different.

# Calculating the relative conversion gain for group B/A:

print("{0:.3f}".format(sampleBFiltered.mean()/sampleAFiltered.mean()-1)) #the relative loss of group B

-0.023

Thus for the filtered data we can't reject the null hypothesis that there's not a statistically significant difference in conversion between the groups. And the relative loss of group B is even less now, 2%.

12. Statistical significance of the difference in average order size between the groups using the filtered data. ¶

Keeping stated hypothesis and significance level:

H0: average order size between the groups is the same

H1: average order size differs between the groups

alpha = .05 
 
results =  st.mannwhitneyu(orders_filtered[np.logical_and(
        orders_filtered['group']=='A',
        np.logical_not(orders_filtered['visitorId'].isin(abnormalUsers)))]['revenue'], orders_filtered[np.logical_and(
        orders_filtered['group']=='B',
        np.logical_not(orders_filtered['visitorId'].isin(abnormalUsers)))]['revenue'])[1] 

print("{0:.3f}".format(results)) 
 
if (results< alpha):
    print("We reject the null hypothesis")
else:
    print("We can't reject the null hypothesis")

0.143
We can't reject the null hypothesis

Filtering out outliers almost hasn't changed the outcome here.

#Calculating the relative difference in average order size between the groups (B/A):

print("{0:.3f}".format(
    orders_filtered[np.logical_and(orders_filtered['group']=='B',np.logical_not(orders_filtered['visitorId'].isin(abnormalUsers)))]['revenue'].mean()/
    orders_filtered[np.logical_and(
        orders_filtered['group']=='A',
        np.logical_not(orders_filtered['visitorId'].isin(abnormalUsers)))]['revenue'].mean() - 1))

-0.070

We failed to reject the null hypothesis for the filtered data as well and now the difference between the segments has got even less, 2.9 % instead of 3.2%.

13. Making decision based on the test results. (The possible decisions are: 1. Stop the test, consider one of the groups the leader. 2. Stop the test, conclude that there is no difference between the groups. 3. Continue the test.) ¶

In course of work we have discovered the following facts:

- Neither raw nor filtered data revealed any statistically significant differences in conversion between the groups.
- Neither raw nor filtered data revealed any statistically significant differences in average order size between the groups.
- The graph showing the difference in average order size between the groups tells us that group B's results got much better at the beginning of experiment, then after period of fluctuations it stabalised at the same level as the ones of group A. By the end of experiment they were just about 5% lower than group A's (see the graph here [pic 1](#section_1)).

- The graph showing the difference in conversion between the groups tells us that group B's results are worse and don't seem to be improving significantly (see the graph here [pic 2](#section_2)).

Based on these facts, we can conclude that the test was unsuccessful. There difference in key metrics of the groups are not statistically significant. We can see no use in continuing the experiment as the calculations and graphs (either this [pic 1](#section_1) or this [pic 2](#section_2) ) clearly show that the result has stabalaized and the probability that segment B will turn out to be better than segment A is almost nonexistent.

	transactionId	visitorId	date	revenue	group
0	3667963787	3312258926	2019-08-15	30.4	B
1	2804400009	3642806036	2019-08-15	15.2	B
2	2961555356	4069496402	2019-08-15	10.2	A
3	3797467345	1196621759	2019-08-15	155.1	B
4	2282983706	2322279887	2019-08-15	40.5	B
...	...	...	...	...	...
1192	2662137336	3733762160	2019-08-14	100.8	B
1193	2203539145	370388673	2019-08-14	50.1	A
1194	1807773912	573423106	2019-08-14	165.3	A
1195	1947021204	1614305549	2019-08-14	5.5	A
1196	3936777065	2108080724	2019-08-15	3120.1	B

	transactionId	visitorId	date	revenue	group
0	3667963787	3312258926	2019-08-15	30.4	B
1	2804400009	3642806036	2019-08-15	15.2	B
3	3797467345	1196621759	2019-08-15	155.1	B
4	2282983706	2322279887	2019-08-15	40.5	B
5	182168103	935554773	2019-08-15	35.0	B
...	...	...	...	...	...
1191	3592955527	608641596	2019-08-14	255.7	B
1192	2662137336	3733762160	2019-08-14	100.8	B
1193	2203539145	370388673	2019-08-14	50.1	A
1194	1807773912	573423106	2019-08-14	165.3	A
1196	3936777065	2108080724	2019-08-15	3120.1	B

	transactionId	visitorId	date	revenue	group
7	2626614568	78758296	2019-08-15	20.1	A
8	1576988021	295230930	2019-08-15	215.9	A
10	3649131742	879864040	2019-08-15	65.7	A
12	2809384911	3213223594	2019-08-15	115.4	A
14	3859315276	1565034404	2019-08-15	170.9	A
...	...	...	...	...	...
1185	948555526	2710193528	2019-08-14	15.2	A
1187	2274188238	132143192	2019-08-14	15.5	A
1188	1692103379	1022829655	2019-08-14	5.7	A
1193	2203539145	370388673	2019-08-14	50.1	A
1194	1807773912	573423106	2019-08-14	165.3	A

	transactionId	visitorId	date	revenue	group
0	3667963787	3312258926	2019-08-15	30.4	B
1	2804400009	3642806036	2019-08-15	15.2	B
3	3797467345	1196621759	2019-08-15	155.1	B
4	2282983706	2322279887	2019-08-15	40.5	B
5	182168103	935554773	2019-08-15	35.0	B
...	...	...	...	...	...
1186	2116724386	2146711261	2019-08-14	240.6	B
1190	38478481	1197745035	2019-08-14	135.3	B
1191	3592955527	608641596	2019-08-14	255.7	B
1192	2662137336	3733762160	2019-08-14	100.8	B
1196	3936777065	2108080724	2019-08-15	3120.1	B

	transactionId	visitorId	date	revenue	group
0	3667963787	3312258926	2019-08-15	30.4	B
1	2804400009	3642806036	2019-08-15	15.2	B
3	3797467345	1196621759	2019-08-15	155.1	B
5	182168103	935554773	2019-08-15	35.0	B
6	398296753	2900797465	2019-08-15	30.6	B
...	...	...	...	...	...
1186	2116724386	2146711261	2019-08-14	240.6	B
1190	38478481	1197745035	2019-08-14	135.3	B
1191	3592955527	608641596	2019-08-14	255.7	B
1192	2662137336	3733762160	2019-08-14	100.8	B
1196	3936777065	2108080724	2019-08-15	3120.1	B

	Hypothesis	Reach	Impact	Confidence	Effort
0	Add two new channels for attracting traffic. T...	3	10	8	6
1	Launch your own delivery service. This will sh...	2	5	4	10
2	Add product recommendation blocks to the store...	8	3	7	3
3	Change the category structure. This will incre...	8	3	3	8
4	Change the background color on the main page. ...	3	1	1	1
5	Add a customer review page. This will increase...	3	2	2	3
6	Show banners with current offers and sales on ...	5	3	8	3
7	Add a subscription form to all the main pages....	10	7	8	5
8	Launch a promotion that gives users discounts ...	1	9	9	5

	date	group	visits
0	2019-08-01	A	719
1	2019-08-02	A	619
2	2019-08-03	A	507
3	2019-08-04	A	717
4	2019-08-05	A	756
...	...	...	...
57	2019-08-27	B	720
58	2019-08-28	B	654
59	2019-08-29	B	531
60	2019-08-30	B	490
61	2019-08-31	B	718

	date	group
7	2019-08-15	A
45	2019-08-16	A
55	2019-08-01	A
86	2019-08-22	A
124	2019-08-17	A
...	...	...
930	2019-08-31	B
1043	2019-08-11	B
1067	2019-08-12	B
1114	2019-08-13	B
1149	2019-08-14	B

	date	group	transactionId	visitorId	revenue
55	2019-08-01	A	23	19	2266.6
66	2019-08-01	B	12	12	650.6
175	2019-08-02	A	42	36	3734.9
173	2019-08-02	B	31	30	2772.6
291	2019-08-03	A	66	60	5550.1

	visitorId	orders
0	11685486	1
1	54447517	1
2	66685450	1
3	78758296	1
4	85103373	1

	visitorId	orders
0	11685486	1
1	54447517	1
2	66685450	1
3	78758296	1
4	85103373	1