In cource of the project we are going to analyse a list of hypotheses and results of A/B testing.

In the Part 1 of the project we will range the hypothesis that may help boost revenue of a big online store. To do this we will look into data that is stored in the "hypotheses" table and set priority to every hypothesis using different appoaches (ICE and RICE).

Next in the Part 2 of the Project we are going to analyse A/B test results that were collected duting 1-month experiment. We are going to look into the data stored in 2 tables: orders_us and visits_us. For the purpose of analysis we are going to split the collected data in 2 equal groups and calculate statistical significane of the difference in orders size and in conversion rates. This way we are going to check if the difference in the key metrics of these groups is due to random chance or it is due to the treatment.

In the end we are going to make conclusions if the experiment was a success, should we continue collecting data or should we stop it.

In [1]:
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats as st
import numpy
import seaborn as sns
from IPython.display import display

hypotheses = pd.read_csv('./hypotheses_us.csv', sep=';')

Part 1. Prioritizing Hypotheses

Data study

In [2]:
display(hypotheses)
Hypothesis Reach Impact Confidence Effort
0 Add two new channels for attracting traffic. T... 3 10 8 6
1 Launch your own delivery service. This will sh... 2 5 4 10
2 Add product recommendation blocks to the store... 8 3 7 3
3 Change the category structure. This will incre... 8 3 3 8
4 Change the background color on the main page. ... 3 1 1 1
5 Add a customer review page. This will increase... 3 2 2 3
6 Show banners with current offers and sales on ... 5 3 8 3
7 Add a subscription form to all the main pages.... 10 7 8 5
8 Launch a promotion that gives users discounts ... 1 9 9 5
In [3]:
#lowercase to all column names of hypotheses df
hypotheses.columns = map(str.lower, hypotheses.columns)
In [4]:
hypotheses.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   hypothesis  9 non-null      object
 1   reach       9 non-null      int64 
 2   impact      9 non-null      int64 
 3   confidence  9 non-null      int64 
 4   effort      9 non-null      int64 
dtypes: int64(4), object(1)
memory usage: 388.0+ bytes
In [5]:
hypotheses['hypothesis'].value_counts()
Out[5]:
Change the background color on the main page. This will increase user engagement                                          1
Launch your own delivery service. This will shorten delivery time                                                         1
Change the category structure. This will increase conversion since users will find the products they want more quickly    1
Add product recommendation blocks to the store's site. This will increase conversion and average purchase size            1
Add two new channels for attracting traffic. This will bring 30% more users                                               1
Add a subscription form to all the main pages. This will help you compile a mailing list                                  1
Launch a promotion that gives users discounts on their birthdays                                                          1
Add a customer review page. This will increase the number of orders                                                       1
Show banners with current offers and sales on the main page. This will boost conversion                                   1
Name: hypothesis, dtype: int64

Hypotheses ranging

ICE framework

We are going to calculate ICE score the following way: (impact * confidence)/effort

In [6]:
#adding the ICE column to the table:
hypotheses['ICE'] = (hypotheses['impact']*hypotheses['confidence'])/hypotheses['effort']

#printing the result:
print(hypotheses[['hypothesis','ICE']].sort_values(by='ICE', ascending=False))
                                          hypothesis        ICE
8  Launch a promotion that gives users discounts ...  16.200000
0  Add two new channels for attracting traffic. T...  13.333333
7  Add a subscription form to all the main pages....  11.200000
6  Show banners with current offers and sales on ...   8.000000
2  Add product recommendation blocks to the store...   7.000000
1  Launch your own delivery service. This will sh...   2.000000
5  Add a customer review page. This will increase...   1.333333
3  Change the category structure. This will incre...   1.125000
4  Change the background color on the main page. ...   1.000000

According to the ICE framework the most highly ranked hypothesis are no. 8 and 0: "Launch a promotion that gives users discounts on their birthdays" and "Add two new channels for attracting traffic. This will bring 30% more users".

RICE framework

This method is helpful if it’s important to understand how many of our customers a given feature will benefit. We are going to calculate RICE score the following way:

RICE score = (reachimpactconfidence)/effort

In [7]:
#adding the RICE column to the table:
hypotheses['RICE'] = (hypotheses['reach']*hypotheses['impact']*hypotheses['confidence'])/hypotheses['effort']

#printing the result:
print(hypotheses[['hypothesis','RICE']].sort_values(by='RICE', ascending=False))
                                          hypothesis   RICE
7  Add a subscription form to all the main pages....  112.0
2  Add product recommendation blocks to the store...   56.0
0  Add two new channels for attracting traffic. T...   40.0
6  Show banners with current offers and sales on ...   40.0
8  Launch a promotion that gives users discounts ...   16.2
3  Change the category structure. This will incre...    9.0
1  Launch your own delivery service. This will sh...    4.0
5  Add a customer review page. This will increase...    4.0
4  Change the background color on the main page. ...    3.0

Taking into account reach factor has changed the ranking dramatically. Now the top priority hypothesis are no. 7 and 2: "Add a subscription form to all the main pages. This will help you compile a mailing list" and "Add product recommendation blocks to the store's site. This will increase conversion and average purchase size".

In [8]:
#heat map for the ranged hypothesis:
hypotheses_ranged = hypotheses.drop(['hypothesis','reach', 'impact', 'confidence', 'effort'], axis=1)
sns.set(style='white')
plt.figure(figsize=(20, 3))
plt.title('Ranged hypotheses', fontsize =16)
sns.heatmap(hypotheses_ranged.T, annot=True, fmt='g', linewidths=1, linecolor='grey');

Hypotheses ranging looks different for the two frameworks. If we want to take into consideration as many factors as possible we should start testing the RICE top ranked hypotheses: no. 7, 2, 0, 6. This way we will prioritize the hypotheses that are going to reach the maximum number of customers.

Part 2. A/B Test Analysis

Data study

In [9]:
#adjustments of the paths for the Practicum platform:
#orders = pd.read_csv('/datasets/orders_us.csv', sep=',')
#visits = pd.read_csv('/datasets/visits_us.csv', sep=',')
orders = pd.read_csv('./orders_us.csv', sep=',')
visits = pd.read_csv('./visits_us.csv', sep=',')
In [10]:
display(orders)
transactionId visitorId date revenue group
0 3667963787 3312258926 2019-08-15 30.4 B
1 2804400009 3642806036 2019-08-15 15.2 B
2 2961555356 4069496402 2019-08-15 10.2 A
3 3797467345 1196621759 2019-08-15 155.1 B
4 2282983706 2322279887 2019-08-15 40.5 B
... ... ... ... ... ...
1192 2662137336 3733762160 2019-08-14 100.8 B
1193 2203539145 370388673 2019-08-14 50.1 A
1194 1807773912 573423106 2019-08-14 165.3 A
1195 1947021204 1614305549 2019-08-14 5.5 A
1196 3936777065 2108080724 2019-08-15 3120.1 B

1197 rows × 5 columns

In [11]:
orders.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1197 entries, 0 to 1196
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   transactionId  1197 non-null   int64  
 1   visitorId      1197 non-null   int64  
 2   date           1197 non-null   object 
 3   revenue        1197 non-null   float64
 4   group          1197 non-null   object 
dtypes: float64(1), int64(2), object(2)
memory usage: 37.5+ KB
In [12]:
#changing the date type 
orders['date'] = pd.to_datetime(orders['date'], format="%Y.%m.%d")
In [13]:
#checking if there are visitors that got to both groups, A and B

visitors_groups = orders.groupby('visitorId', as_index=False).agg({'group' : pd.Series.nunique})
visitors_duplicated = visitors_groups.query('group>1')
visitors_duplicated['visitorId'].count()
Out[13]:
58

We have 58 visitors that got to both groups. We will exclude them from our db (as according to the A/B testing rules users can’t take a part in 2 groups) and then we will form equal groups A and B with random sample method.

In [14]:
#excluding visitors that got to 2 groups:

orders_filtered = orders.query('visitorId not in @visitors_duplicated.visitorId')
display(orders_filtered)
transactionId visitorId date revenue group
0 3667963787 3312258926 2019-08-15 30.4 B
1 2804400009 3642806036 2019-08-15 15.2 B
3 3797467345 1196621759 2019-08-15 155.1 B
4 2282983706 2322279887 2019-08-15 40.5 B
5 182168103 935554773 2019-08-15 35.0 B
... ... ... ... ... ...
1191 3592955527 608641596 2019-08-14 255.7 B
1192 2662137336 3733762160 2019-08-14 100.8 B
1193 2203539145 370388673 2019-08-14 50.1 A
1194 1807773912 573423106 2019-08-14 165.3 A
1196 3936777065 2108080724 2019-08-15 3120.1 B

1016 rows × 5 columns

In [15]:
visitorsA = orders_filtered.query('group=="A"')
display(visitorsA)
transactionId visitorId date revenue group
7 2626614568 78758296 2019-08-15 20.1 A
8 1576988021 295230930 2019-08-15 215.9 A
10 3649131742 879864040 2019-08-15 65.7 A
12 2809384911 3213223594 2019-08-15 115.4 A
14 3859315276 1565034404 2019-08-15 170.9 A
... ... ... ... ... ...
1185 948555526 2710193528 2019-08-14 15.2 A
1187 2274188238 132143192 2019-08-14 15.5 A
1188 1692103379 1022829655 2019-08-14 5.7 A
1193 2203539145 370388673 2019-08-14 50.1 A
1194 1807773912 573423106 2019-08-14 165.3 A

468 rows × 5 columns

In [16]:
visitorsA['visitorId'].nunique()
Out[16]:
445
In [17]:
visitorsB_raw = orders_filtered.query('group=="B"')
display(visitorsB_raw)
transactionId visitorId date revenue group
0 3667963787 3312258926 2019-08-15 30.4 B
1 2804400009 3642806036 2019-08-15 15.2 B
3 3797467345 1196621759 2019-08-15 155.1 B
4 2282983706 2322279887 2019-08-15 40.5 B
5 182168103 935554773 2019-08-15 35.0 B
... ... ... ... ... ...
1186 2116724386 2146711261 2019-08-14 240.6 B
1190 38478481 1197745035 2019-08-14 135.3 B
1191 3592955527 608641596 2019-08-14 255.7 B
1192 2662137336 3733762160 2019-08-14 100.8 B
1196 3936777065 2108080724 2019-08-15 3120.1 B

548 rows × 5 columns

In [18]:
visitorsB_raw['visitorId'].nunique()
Out[18]:
528

As group B bigger than group A(445 vs 528 unique visitors), we will randomly pick up from this initial group the same number of visitors as in group A (=445).

In [19]:
import random
visitorsB_raw_ID = visitorsB_raw['visitorId'].values.tolist()
visitorsB_ID = random.sample(visitorsB_raw_ID, k=445)

display(visitorsB_ID) #we got list of 445 unique visitors IDs for group B
[1221478247,
 1075551409,
 3647477940,
 2366521417,
 1918801743,
 4009017467,
 2025197455,
 930356060,
 1766557988,
 1964413493,
 931669455,
 3103718471,
 619764962,
 1404560065,
 1588775363,
 3931967268,
 2830871062,
 1627549308,
 3062077384,
 1476741193,
 3097512534,
 2570444314,
 3577042875,
 2914239125,
 1967724198,
 3940815455,
 927791387,
 1605520752,
 258610743,
 1310081088,
 2819363575,
 509080881,
 2953427180,
 2139465532,
 249864742,
 3733762160,
 266115224,
 3584855583,
 3077396270,
 4258103007,
 1700445283,
 3661375494,
 2406916810,
 4224378620,
 3647934575,
 1703949673,
 228221722,
 2614516882,
 3953983768,
 1786280350,
 3741503045,
 582460750,
 2531836666,
 2146711261,
 2868887864,
 2582702340,
 1151477821,
 1387624577,
 1455323168,
 2626290177,
 2279926127,
 4037043590,
 3499733621,
 3288820196,
 954785863,
 2572275626,
 2245562049,
 2230225652,
 3309365614,
 3077396270,
 3319676739,
 3254689071,
 3455264956,
 414545763,
 4062713576,
 2201480904,
 1579036099,
 1782658204,
 2397025226,
 2025197455,
 1253818784,
 680119828,
 1630618580,
 1826431635,
 544963298,
 2805132315,
 916684595,
 4231233292,
 2273809679,
 4279090005,
 3042692891,
 1253370069,
 4186744110,
 3261022559,
 683753316,
 1692957755,
 102981183,
 2853587382,
 4157461730,
 2606908679,
 3942804483,
 3218863961,
 1060341621,
 2944179391,
 3362039251,
 1831227158,
 3085290658,
 407706407,
 3362039251,
 201615765,
 2328378067,
 1010866419,
 3994425631,
 157579733,
 4134830387,
 666694713,
 78678012,
 1557781683,
 1985475298,
 2213855729,
 3371277047,
 2038023866,
 1388838483,
 2307676173,
 3427397479,
 2710492161,
 1341752722,
 1795951456,
 1263373242,
 3736558232,
 3185203440,
 3074968175,
 39475350,
 2580397131,
 1501480579,
 3423937755,
 3245907439,
 3907107900,
 978473912,
 2987864213,
 2128962954,
 2968526225,
 1713857373,
 284094220,
 3357398384,
 280073872,
 2524310328,
 1404560065,
 3304571422,
 1395236619,
 938079215,
 1354599463,
 4100747010,
 2370420350,
 3131127734,
 1225835168,
 1606766439,
 3815257471,
 4134830387,
 3600436937,
 2019355024,
 3770394574,
 4216089391,
 4018812886,
 470091460,
 3785537730,
 3860669668,
 3836897869,
 634450378,
 2461709734,
 1571986409,
 1807547430,
 962484204,
 3752369942,
 2973109265,
 2044161277,
 854695158,
 243620413,
 643335272,
 3866069709,
 935554773,
 3391919803,
 2846507753,
 3342201840,
 816783716,
 943674707,
 1172166732,
 2742574263,
 400358405,
 2742574263,
 522292794,
 2223522147,
 2316833558,
 1562205279,
 3935952912,
 2870367609,
 447836349,
 3909304777,
 586100578,
 104602633,
 1728743776,
 2669052850,
 433951253,
 3573155649,
 2928315504,
 3372131337,
 1667360450,
 410113592,
 3577713868,
 3463388890,
 2900797465,
 264456202,
 530257481,
 2738601405,
 3977396102,
 2108080724,
 1379151787,
 3704174715,
 4012829377,
 642826102,
 2412366830,
 2552584164,
 2785527055,
 2421265884,
 3724732821,
 3141507731,
 2395035985,
 1985475298,
 2999642019,
 1554533732,
 4169666863,
 1577719542,
 114507418,
 406208401,
 2406897935,
 3893674369,
 1494664757,
 2278620397,
 4027395615,
 2663148288,
 1637811943,
 3312258926,
 3686264987,
 856264494,
 290022471,
 872679699,
 1741803981,
 48147722,
 535857429,
 2604694259,
 259989552,
 3906435972,
 4251115836,
 710690947,
 723038324,
 4003628586,
 2413071759,
 3239199975,
 4194288377,
 406208401,
 728063420,
 5114589,
 3511617963,
 1110007955,
 1377447894,
 2903830376,
 1826088570,
 2849720983,
 1097629811,
 1858007013,
 3713138051,
 3294452564,
 625115313,
 458253749,
 368204224,
 2361350105,
 1196621759,
 1738775963,
 318152663,
 3590655705,
 2010516786,
 993094131,
 2634105098,
 862992587,
 3910723394,
 3508047169,
 154576532,
 990849616,
 2941133924,
 2261102418,
 2986402114,
 3332057097,
 2908466434,
 1100237501,
 1612349123,
 2366521417,
 2096506107,
 3941503500,
 3260527149,
 3812442168,
 689129267,
 1335154194,
 2210805728,
 3704741858,
 4251087345,
 3591136796,
 716514914,
 1634934899,
 2091902649,
 2115666868,
 702994395,
 2452603128,
 608641596,
 3636663279,
 3187581794,
 3927376365,
 4166660304,
 544632063,
 600572117,
 2328791559,
 1421513677,
 1206072200,
 2999778091,
 233773277,
 2378092656,
 2399568665,
 522292794,
 1611907877,
 2604012846,
 2075769885,
 152397976,
 1197745035,
 2036584322,
 2763027084,
 3118156181,
 3870317023,
 2042457747,
 2617784228,
 222718357,
 1016890234,
 1964799739,
 1471458381,
 1373713654,
 2138109167,
 3170813781,
 3170292896,
 621427706,
 2247745159,
 2819072957,
 1920142716,
 1667068825,
 930758570,
 2536252163,
 2742574263,
 3873846164,
 1967247027,
 4033511429,
 3642806036,
 3255082731,
 56960363,
 91069833,
 325179089,
 2578486380,
 570540961,
 1197773061,
 654930236,
 2973964255,
 1596970166,
 249864742,
 309675944,
 1722729964,
 3379260072,
 171459513,
 1995481842,
 1420364840,
 3722970464,
 1882260405,
 451104171,
 656974993,
 2162167602,
 2468974761,
 2913441496,
 427004138,
 2700013943,
 162423694,
 172218683,
 671625855,
 3310105714,
 567175747,
 1111826046,
 2295217812,
 2758621772,
 2397843657,
 4006260374,
 707833925,
 1768898981,
 650280576,
 3367493247,
 282188993,
 1147554328,
 368679516,
 1802957007,
 3917749921,
 2467439671,
 3332057097,
 888512513,
 1035424004,
 179951984,
 683554837,
 1397211727,
 2476105225,
 3419266314,
 835577924,
 4181076848,
 2488182306,
 47206413,
 3595021912,
 2096506107,
 471842048,
 518847692,
 1333058479,
 3524303171,
 3024988998,
 2396134342,
 835561402,
 856349446,
 1117202045,
 2373765141,
 2171487014,
 1217561296,
 2873491771,
 1690383497,
 1893387602,
 850320739,
 583433650,
 3791892006,
 1274716998,
 3735525502,
 4228915420,
 3077470899,
 363280180,
 600884435,
 2156256996,
 1261629107,
 1215602524]
In [20]:
#setting a df with visitors from group B equal to group A

visitorsB = visitorsB_raw.query('visitorId == @visitorsB_ID')
display(visitorsB)
transactionId visitorId date revenue group
0 3667963787 3312258926 2019-08-15 30.4 B
1 2804400009 3642806036 2019-08-15 15.2 B
3 3797467345 1196621759 2019-08-15 155.1 B
5 182168103 935554773 2019-08-15 35.0 B
6 398296753 2900797465 2019-08-15 30.6 B
... ... ... ... ... ...
1186 2116724386 2146711261 2019-08-14 240.6 B
1190 38478481 1197745035 2019-08-14 135.3 B
1191 3592955527 608641596 2019-08-14 255.7 B
1192 2662137336 3733762160 2019-08-14 100.8 B
1196 3936777065 2108080724 2019-08-15 3120.1 B

451 rows × 5 columns

In [21]:
#joining the two tables with A visitors and B visitors together
frames = [visitorsA, visitorsB]
orders_filtered = pd.concat(frames)
orders_filtered
Out[21]:
transactionId visitorId date revenue group
7 2626614568 78758296 2019-08-15 20.1 A
8 1576988021 295230930 2019-08-15 215.9 A
10 3649131742 879864040 2019-08-15 65.7 A
12 2809384911 3213223594 2019-08-15 115.4 A
14 3859315276 1565034404 2019-08-15 170.9 A
... ... ... ... ... ...
1186 2116724386 2146711261 2019-08-14 240.6 B
1190 38478481 1197745035 2019-08-14 135.3 B
1191 3592955527 608641596 2019-08-14 255.7 B
1192 2662137336 3733762160 2019-08-14 100.8 B
1196 3936777065 2108080724 2019-08-15 3120.1 B

919 rows × 5 columns

In [22]:
#moving on to "visits" df:
display(visits)
date group visits
0 2019-08-01 A 719
1 2019-08-02 A 619
2 2019-08-03 A 507
3 2019-08-04 A 717
4 2019-08-05 A 756
... ... ... ...
57 2019-08-27 B 720
58 2019-08-28 B 654
59 2019-08-29 B 531
60 2019-08-30 B 490
61 2019-08-31 B 718

62 rows × 3 columns

In [23]:
visits.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 62 entries, 0 to 61
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   date    62 non-null     object
 1   group   62 non-null     object
 2   visits  62 non-null     int64 
dtypes: int64(1), object(2)
memory usage: 1.0+ KB
In [24]:
#changing the date type 
visits['date'] = pd.to_datetime(visits['date'], format="%Y.%m.%d")

Results study

1. Cumulative revenue by group

We are going to graph cumulative revenue by group and make conclusions.

In [25]:
# building an array with unique paired date-group values 
datesGroups = orders_filtered[['date','group']].drop_duplicates()
display(datesGroups)
date group
7 2019-08-15 A
45 2019-08-16 A
55 2019-08-01 A
86 2019-08-22 A
124 2019-08-17 A
... ... ...
930 2019-08-31 B
1043 2019-08-11 B
1067 2019-08-12 B
1114 2019-08-13 B
1149 2019-08-14 B

62 rows × 2 columns

In [26]:
# getting aggregated cumulative daily data on orders revenue
ordersAggregated = datesGroups.apply(lambda x: orders_filtered[np.logical_and(orders_filtered['date'] <= x['date'], orders_filtered['group'] == x['group'])].agg({'date' : 'max', 'group' : 'max', 'transactionId' : pd.Series.nunique, 'visitorId' : pd.Series.nunique, 'revenue' : 'sum'}), axis=1).sort_values(by=['date','group'])
ordersAggregated.head()
Out[26]:
date group transactionId visitorId revenue
55 2019-08-01 A 23 19 2266.6
66 2019-08-01 B 12 12 650.6
175 2019-08-02 A 42 36 3734.9
173 2019-08-02 B 31 30 2772.6
291 2019-08-03 A 66 60 5550.1
In [27]:
# getting aggregated cumulative daily data on visits  
#the number of distinct visits in the test group up to the specified date, inclusive
visitsAggregated = datesGroups.apply(lambda x: visits[np.logical_and(visits['date'] <= x['date'], visits['group'] == x['group'])].agg({'date' : 'max', 'group' : 'max', 'visits' : 'sum'}), axis=1).sort_values(by=['date','group'])
visitsAggregated.head()
Out[27]:
date group visits
55 2019-08-01 A 719
66 2019-08-01 B 713
175 2019-08-02 A 1338
173 2019-08-02 B 1294
291 2019-08-03 A 1845
In [28]:
# merging the two tables into one and giving its columns descriptive names
cumulativeData = ordersAggregated.merge(visitsAggregated, left_on=['date', 'group'], right_on=['date', 'group'])
cumulativeData.columns = ['date', 'group', 'orders', 'buyers', 'revenue', 'visits']

print(cumulativeData.head(5)) 
        date group  orders  buyers  revenue  visits
0 2019-08-01     A      23      19   2266.6     719
1 2019-08-01     B      12      12    650.6     713
2 2019-08-02     A      42      36   3734.9    1338
3 2019-08-02     B      31      30   2772.6    1294
4 2019-08-03     A      66      60   5550.1    1845

We have received cumulative data on revenue per day per every group. Now we are goinug to visualize it for further analysis.

In [29]:
# DataFrame with cumulative orders and cumulative revenue by day, group A
cumulativeRevenueA = cumulativeData[cumulativeData['group']=='A'][['date','revenue', 'orders']]

# DataFrame with cumulative orders and cumulative revenue by day, group B
cumulativeRevenueB = cumulativeData[cumulativeData['group']=='B'][['date','revenue', 'orders']]

# Plotting the group A revenue graph 
plt.figure(figsize=(20,10))
plt.plot(cumulativeRevenueA['date'], cumulativeRevenueA['revenue'], label='A')

# Plotting the group B revenue graph 
plt.plot(cumulativeRevenueB['date'], cumulativeRevenueB['revenue'], label='B')

plt.xticks(rotation=90)
plt.title('Cumulative Revenue Graph for A and B groups', fontsize=18)
plt.xlabel('Time period', fontsize=12)
plt.ylabel('Revenue Amount', fontsize=12)
plt.grid()
plt.legend() 
plt.show()

From the start of experiment cumulative revenue for both groups slightly fluctuated around the same fugures but approximately after 18/08/2019 group B started loosing their positions and group A became the leader.

Till 18.08.2019 cumulative revenue for both groups slightly fluctuated around the same fugures and were almost equal. After 18.08.2019 the revenue of the group A rose significantly. It might have been caused by some big order or some non-recurrent expencive purchase made in this group. We will plot another graph to investigate the situation.

2. Cumulative average order size by group

In [30]:
#We will plot average purchase size by groups. We'll divide cumulative revenue by the cumulative number of orders:
plt.figure(figsize=(20,10))
plt.plot(cumulativeRevenueA['date'], cumulativeRevenueA['revenue']/cumulativeRevenueA['orders'], label='A')
plt.plot(cumulativeRevenueB['date'], cumulativeRevenueB['revenue']/cumulativeRevenueB['orders'], label='B')

plt.xticks(rotation=90)

plt.title('Average Purchase Size for A and B groups', fontsize=18)
plt.xlabel('Time period', fontsize=12)
plt.ylabel('Purchase Size', fontsize=12)
plt.grid()
plt.legend() 
plt.show()

Average purchase size of group B exceeded the one of group A from the very beginning of the experiment. Values for both groups had strong fluctuations during the first half of experiment; the fluctuations eased after 15/08/2019. After this date cumulative average order size for both groups stabilized around the same value (105-110). But bu the end of experiment group A slightly surpassed average purchase size of group B. Sharp peaks in the charts can be explained by some expencive or large orders. We will investigate them later.

3. Relative difference in cumulative average order size for group B compared with group A.


We will plot a relative difference graph for the average purchase sizes. To do this we will form a new data frame with cumulative revenue for both groups merging two tables. Relative difference will be calculated as ratio between cumulative average order size of group B and group A.

We'll add a horizontal axis where ratio equals to zero (values of both groups are equal).

pic 1

In [31]:
# gathering the data into one DataFrame
mergedCumulativeRevenue = cumulativeRevenueA.merge(cumulativeRevenueB, left_on='date', right_on='date', how='left', suffixes=['A', 'B'])

# plotting a relative difference graph for the average purchase sizes
plt.figure(figsize=(20,10))
plt.plot(mergedCumulativeRevenue['date'], (mergedCumulativeRevenue['revenueB']/mergedCumulativeRevenue['ordersB'])/(mergedCumulativeRevenue['revenueA']/mergedCumulativeRevenue['ordersA'])-1)
plt.xticks(rotation=90)
plt.grid()
# adding the X axis
plt.axhline(y=0, color='black', linestyle=':')

plt.title('Relative Difference graph for the average purchase sizes for A and B groups', fontsize=18)
plt.xlabel('Time period', fontsize=12)
plt.ylabel('Relative Purchase Size', fontsize=12)

plt.show()

Average purchase size of group B started to exceed group A after 03/08/2019 reaching its peak on 7/08/2019, then it lowered for several days from 12/082019 till 15/08/2019. After that date the relative difference stabilized around 0 anf slightly dropped after 27/08/2019. During the 2nd half of experiment the two groups did nit show much difference in average purchase size.

4. Conversion rate per group

We will calculate conversion rate as the ratio of orders to the number of visits for each day. Next we will plot it.

In [32]:
#checking the date range:
mergedCumulativeRevenue['date'].describe()
Out[32]:
count                      31
unique                     31
top       2019-08-02 00:00:00
freq                        1
first     2019-08-01 00:00:00
last      2019-08-31 00:00:00
Name: date, dtype: object

We will calculate cumulative conversion and then a relative difference for the cumulative conversion rates

In [33]:
#calculating cumulative conversion as the ratio of orders to the number of visits for each day
cumulativeData['conversion'] = cumulativeData['orders']/cumulativeData['visits']

# selecting data on group A 
cumulativeDataA = cumulativeData[cumulativeData['group']=='A']

# selecting data on group B
cumulativeDataB = cumulativeData[cumulativeData['group']=='B']

# plotting the graphs
plt.figure(figsize=(20,10))
plt.plot(cumulativeDataA['date'], cumulativeDataA['conversion'], label='A')
plt.plot(cumulativeDataB['date'], cumulativeDataB['conversion'], label='B')
plt.legend()

plt.xticks(rotation=90)
#plt.axis(['2019-08-01', '2019-08-31', 0, 0.015])
plt.grid()


plt.title('Cumulative Conversion for A and B groups', fontsize=18)
plt.xlabel('Time period', fontsize=12)
plt.ylabel('Conversion rate', fontsize=12)
plt.show()

The chart demonstrates that group A had much better conversion rate at the start of experiment (0.032 vs 0.02) but then it dropped significatly and the difference was not so dramatic afterwards. Fluctuations in conversion of one group corresponded with fluctuations of the second group, they had peaks and troughs at the same dates. Fluctuations eased by the end of experiment for both groups, the difference in conversion rates was not big but stable.

pic 2

In [34]:
#plotting a relative difference graph for the cumulative conversion rates B/A:
mergedCumulativeConversions = cumulativeDataA[['date','conversion']].merge(cumulativeDataB[['date','conversion']], left_on='date', right_on='date', how='left', suffixes=['A', 'B'])
plt.figure(figsize=(20,10))
plt.plot(mergedCumulativeConversions['date'], mergedCumulativeConversions['conversionB']/mergedCumulativeConversions['conversionA']-1, label="Relative gain in conversion in group B as opposed to group A")
plt.legend()

plt.axhline(y=0, color='black', linestyle='--') 
#plt.axhline(y=-0.1, color='grey', linestyle='--') 
#plt.axis(["2019-08-01", '2019-08-31', -0.6, 0.6])

plt.xticks(rotation=90)
plt.grid()

plt.title('Relative difference graph for B/A groups', fontsize=18)
plt.xlabel('Time period', fontsize=12)
plt.ylabel('rate', fontsize=12)
plt.show()
In [35]:
#plotting a relative difference graph for the cumulative conversion rates A/B:
plt.figure(figsize=(20,10))
plt.plot(mergedCumulativeConversions['date'], mergedCumulativeConversions['conversionA']/mergedCumulativeConversions['conversionB']-1, label="Relative gain in conversion in group A as opposed to group B")
plt.legend()

plt.axhline(y=0, color='black', linestyle='--') 
#plt.axhline(y=-0.1, color='grey', linestyle='--') 
#plt.axis(["2019-08-01", '2019-08-31', -0.6, 0.6])

plt.xticks(rotation=90)
plt.grid()

plt.title('Relative difference graph for A/B groups', fontsize=18)
plt.xlabel('Time period', fontsize=12)
plt.ylabel('rate', fontsize=12)
plt.show()

These charts prove that conversion of group A surpasses the one of group B. Although the differece eased during the month of experiment it is still 5% higher for group A.

5. Number of orders per user

In [36]:
#creating a table with data on users and orders
ordersByUsers = orders_filtered.drop(['group', 'revenue', 'date'], axis=1).groupby('visitorId', as_index=False).agg({'transactionId' : pd.Series.nunique})
ordersByUsers.columns = ['visitorId','orders']
print(ordersByUsers.sort_values(by='orders',ascending=False).head(10)) 
      visitorId  orders
820  3967698036       3
122   611059232       3
803  3908431265       3
424  2108163459       3
617  2988190573       3
44    249864742       3
564  2742574263       3
363  1772171838       2
846  4109358064       2
102   522292794       2
In [37]:
#plotting chart
x_values = pd.Series(range(0, len(ordersByUsers)))
plt.scatter(x_values, ordersByUsers['orders'])
plt.title('Scatter chart of orders per visitor', fontsize=18)

plt.ylabel('Number of orders', fontsize=12)

plt.grid()
plt.show()

Most of the visitors made 1 order, just few of them made 2 and only several made 3 orders. To define the share of outliers we will calculate percentiles below.

Now we will look at the number of orders per buyer for every group and it's dynamics.

In [38]:
#calculating cumulative conversion as the ratio of orders to the number of visits for each day
cumulativeData['orders_per_users'] = cumulativeData['orders']/cumulativeData['buyers']

# selecting data on group A 
cumulativeDataA = cumulativeData[cumulativeData['group']=='A']

# selecting data on group B
cumulativeDataB = cumulativeData[cumulativeData['group']=='B']

# plotting the scatter chart
plt.figure(figsize=(20,10))
x_values = pd.Series(range(0, len(cumulativeDataA['orders_per_users'])))
plt.scatter(x_values, cumulativeDataA['orders_per_users'], label='A')
plt.scatter(x_values, cumulativeDataB['orders_per_users'], label='B')
plt.title('Scatter chart of orders per user', fontsize=18)

plt.ylabel('Orders/users', fontsize=12)
plt.legend()

plt.grid()
plt.show()

At the beginning group A got 20% more orders per buyer; by the 23rd day of experiment the values stabilized around 1.05 orders per buyer for both groups.

6. 95th and 99th percentiles for the number of orders per user. Defining point of anomaly.

In [39]:
#Calculate the 95th and 99th percentiles for the number of orders per user. 
np.percentile(ordersByUsers['orders'], [95, 99])
Out[39]:
array([1., 2.])
In [40]:
#setting a variable for further calculations
too_many = int(np.percentile(ordersByUsers['orders'], [99]))
too_many
Out[40]:
2

Not more than 5% of users placed more than 1 order, and not more than 1% of users made more than two orders. According to these calculations we can say that we have defined the point at which a data point becomes an anomaly: visitors with 3 orders per capita are outliers.

7. Plot a scatter chart of order prices

In [41]:
#histogram of order price distribution
plt.hist(orders_filtered['revenue'], log=True) 
plt.title('order price distribution', fontsize=18)
plt.xlabel('Revenue per order', fontsize=12)
plt.grid()
plt.show()

We can see that most of the orders bring up to 1500. Still there are some outliers with revenue over 3000. To investigate it we will buld scatter plot.

In [42]:
#another chart to evaluate order prices:
x_values = pd.Series(range(0, len(orders_filtered['revenue'])))
plt.scatter(x_values, orders_filtered['revenue'])
plt.title('Scatter chart of revenue per order', fontsize=18)

plt.ylabel('Revenue', fontsize=12)

plt.grid()
plt.show()

The second chart proves that there are only several orders with revenue over 1000. They can be stated as outliers. In order to state the upper fence more precisely we will calculate the percentiles as well.

8. 95th and 99th percentiles of order prices. Point of anomaly.

In [43]:
#Calculatinug the 95th and 99th percentiles for the order prices. 
np.percentile(orders_filtered['revenue'], [95, 99])
Out[43]:
array([410.5, 830.3])
In [44]:
#setting a variable for further calculations
too_expencive = int(np.percentile(orders_filtered['revenue'], [99]))
too_expencive
Out[44]:
830

Not more than 5% of orders costed over 401, and not more than 1% of orders brought revenue over 830.30. Thus we can state that the point after which an order becomes an anomaly expencive is 830.30. Consequently orders with revenue over 830.30 can be viewed as outliers.

9. Statistical significance of the difference in conversion between the groups using the raw data.

We are going to calculate statistical significance of the difference in conversion between the groups using the Mann-Whitney U test because we have independent data samples with quantative variables (nonparametric version of the Student t-test), also the data doesn't follow normal distribution. We can see that the data is not normally distributed on a histogram here

We'll also find the relative difference in average order size between the groups:

In [45]:
#splitting tables with data on two groups:
ordersByUsersA = orders_filtered[orders_filtered['group']=='A'].groupby('visitorId', as_index=False).agg({'transactionId' : pd.Series.nunique})
ordersByUsersA.columns = ['visitorId', 'orders']

ordersByUsersB = orders_filtered[orders_filtered['group']=='B'].groupby('visitorId', as_index=False).agg({'transactionId' : pd.Series.nunique})
ordersByUsersB.columns = ['visitorId', 'orders'] 
In [46]:
#Combining the two tables with data on visits to form samples
sampleA = pd.concat([ordersByUsersA['orders'],pd.Series(0, index=np.arange(visits[visits['group']=='A']['visits'].sum() - len(ordersByUsersA['orders'])), name='orders')],axis=0)

sampleB = pd.concat([ordersByUsersB['orders'],pd.Series(0, index=np.arange(visits[visits['group']=='B']['visits'].sum() - len(ordersByUsersB['orders'])), name='orders')],axis=0)


Stating hypothesis:

H0: there's not a statistically significant difference in conversion between the groups

H1: average conversion level differs between the groups

A critical statistical significance level will be set at 0.05 as it is a commonly accepted one in the industry for non-multiple testing.

In [47]:
alpha = .05 
 
results =  st.mannwhitneyu(sampleA, sampleB)[1] 

print("{0:.3f}".format(results)) 
 
if (results< alpha):
    print("We reject the null hypothesis")
else:
    print("We can't reject the null hypothesis")
0.267
We can't reject the null hypothesis


We have calculeted the p-value by implification of mannwhitneyu() criterion.

We have received p-value greater than alfa thus with probability of 95% we cannot reject the null hypothesis and conclude that average conversion differs between the groups.


Given the significance level we selected (5%), we failed to reject the null hypothesis, the data does not provide sufficient evidence for it. Therefore, we cannot make conclusion that average order size for both groups is the same.

In [48]:
# Calculating the relative conversion gain for group B/A:

print("{0:.3f}".format(sampleB.mean()/sampleA.mean()-1)) #the relative loss of group B
-0.045

We can't reject the null hypothesis that there's not a statistically significant difference in conversion between the groups. The relative loss of group B is 4.3%.

10. Statistical significance of the difference in average order size between the groups using the raw data

To calculate the statistical significance of the difference in the segments' average order size, we'll pass the data on revenue to the mannwhitneyu() criterion. We'll also find the relative difference in average order size between the groups.


Stating hypothesis:

H0: average order size between the groups is the same

H1: average order size differs between the groups

A critical statistical significance level will be set to 0.05 as it is a commonly accepted one in the industry for non-multiple testing.

In [49]:
alpha = .05 
 
results =  st.mannwhitneyu(orders_filtered[orders_filtered['group']=='A']['revenue'], orders_filtered[orders_filtered['group']=='B']['revenue'])[1] 

print("{0:.3f}".format(results)) 
 
if (results< alpha):
    print("We reject the null hypothesis")
else:
    print("We can't reject the null hypothesis")
0.158
We can't reject the null hypothesis
In [50]:
#Calculating the relative difference in average order size between the groups (B/A):
print("{0:.3f}".format(orders_filtered[orders_filtered['group']=='B']['revenue'].mean()/orders_filtered[orders_filtered['group']=='A']['revenue'].mean()-1)) 
0.320

The p-value is considerably higher than 0.05, so there's no reason to reject the null hypothesis and conclude that average order size differs between the groups.
Nonetheless, the average order size for group B is 3% smaller then the order it is for the group A.

11. Statistical significance of the difference in conversion between the groups using the filtered data

In order to filter the data we will determine the outliers: IDs of the users that made too big or too expencive orders. Then we will filter them out from the raw data and calculate the statisical significance the same way as in pp.9.

In [51]:
# we will find the ubnormal users' IDs:
# usrs that maid too many orders
usersWithManyOrders = pd.concat([ordersByUsersA[ordersByUsersA['orders'] > 2]['visitorId'], ordersByUsersB[ordersByUsersB['orders'] > 2]['visitorId']], axis = 0)
#users that paid too much
usersWithExpensiveOrders = orders_filtered.query('revenue > @too_expencive')['visitorId']
#joining the two groups together
abnormalUsers = pd.concat([usersWithManyOrders, usersWithExpensiveOrders], axis = 0).drop_duplicates().sort_values()
print(abnormalUsers.head(5))
print(abnormalUsers.shape) #defining total number of anomalous users.
1099    148427295
22      249864742
58      611059232
949     887908475
744     888512513
Name: visitorId, dtype: int64
(19,)

We received a list of 18 user IDs that made either too many orders (more than 99% of other visitors) or placed too expencive orders (paid more than 99% of other visitors).

In [52]:
#filtering out the abnormal users from the raw data:

sampleAFiltered = pd.concat([ordersByUsersA[np.logical_not(ordersByUsersA['visitorId'].isin(abnormalUsers))]['orders'],pd.Series(0, index=np.arange(visits[visits['group']=='A']['visits'].sum() - len(ordersByUsersA['orders'])),name='orders')],axis=0)

sampleBFiltered = pd.concat([ordersByUsersB[np.logical_not(ordersByUsersB['visitorId'].isin(abnormalUsers))]['orders'],pd.Series(0, index=np.arange(visits[visits['group']=='B']['visits'].sum() - len(ordersByUsersB['orders'])),name='orders')],axis=0)
In [53]:
ordersByUsersA.head()
Out[53]:
visitorId orders
0 11685486 1
1 54447517 1
2 66685450 1
3 78758296 1
4 85103373 1
In [54]:
df_2=ordersByUsersA.head()
In [55]:
df_2.head()
Out[55]:
visitorId orders
0 11685486 1
1 54447517 1
2 66685450 1
3 78758296 1
4 85103373 1
In [56]:
sampleAFiltered_2 = pd.concat([df_2[np.logical_not(ordersByUsersA['visitorId'].isin(abnormalUsers))]['orders'],pd.Series(0, index=np.arange(visits[visits['group']=='A']['visits'].sum() - len(df_2['orders'])),name='orders')],axis=0)
<ipython-input-56-552f42d04b33>:1: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  sampleAFiltered_2 = pd.concat([df_2[np.logical_not(ordersByUsersA['visitorId'].isin(abnormalUsers))]['orders'],pd.Series(0, index=np.arange(visits[visits['group']=='A']['visits'].sum() - len(df_2['orders'])),name='orders')],axis=0)
In [57]:
sampleAFiltered_2.head(20)
Out[57]:
0     1
1     1
2     1
3     1
4     1
0     0
1     0
2     0
3     0
4     0
5     0
6     0
7     0
8     0
9     0
10    0
11    0
12    0
13    0
14    0
Name: orders, dtype: int64
In [58]:
visits.head()
Out[58]:
date group visits
0 2019-08-01 A 719
1 2019-08-02 A 619
2 2019-08-03 A 507
3 2019-08-04 A 717
4 2019-08-05 A 756
In [59]:
sampleAFiltered.head()
Out[59]:
0    1
1    1
2    1
3    1
4    1
Name: orders, dtype: int64
In [60]:
sampleBFiltered
Out[60]:
0        1
1        1
2        1
3        1
4        1
        ..
18480    0
18481    0
18482    0
18483    0
18484    0
Name: orders, Length: 18909, dtype: int64

Now we will apply the statistical Mann-Whitney criterion to the resulting samples.


Keeping the same hypothesis and statistical significance level (0.05):

H0: there's not a statistically significant difference in conversion between the groups

H1: average conversion level differs between the groups

In [61]:
alpha = .05 
 
results =  st.mannwhitneyu(sampleAFiltered, sampleBFiltered)[1] 

print("{0:.3f}".format(results)) 
 
if (results< alpha):
    print("We reject the null hypothesis")
else:
    print("We can't reject the null hypothesis")
0.325
We can't reject the null hypothesis


We got result that is very close to the previous one. For the data with filtered out outliers we still failed to reject the null hypothesis with the significance level of 5%.

Therefore, we cannot make conclusion that conversion level for both groups is different.

In [62]:
# Calculating the relative conversion gain for group B/A:

print("{0:.3f}".format(sampleBFiltered.mean()/sampleAFiltered.mean()-1)) #the relative loss of group B
-0.023

Thus for the filtered data we can't reject the null hypothesis that there's not a statistically significant difference in conversion between the groups. And the relative loss of group B is even less now, 2%.

12. Statistical significance of the difference in average order size between the groups using the filtered data.


Keeping stated hypothesis and significance level:

H0: average order size between the groups is the same

H1: average order size differs between the groups

In [63]:
alpha = .05 
 
results =  st.mannwhitneyu(orders_filtered[np.logical_and(
        orders_filtered['group']=='A',
        np.logical_not(orders_filtered['visitorId'].isin(abnormalUsers)))]['revenue'], orders_filtered[np.logical_and(
        orders_filtered['group']=='B',
        np.logical_not(orders_filtered['visitorId'].isin(abnormalUsers)))]['revenue'])[1] 

print("{0:.3f}".format(results)) 
 
if (results< alpha):
    print("We reject the null hypothesis")
else:
    print("We can't reject the null hypothesis")
0.143
We can't reject the null hypothesis

Filtering out outliers almost hasn't changed the outcome here.

In [64]:
#Calculating the relative difference in average order size between the groups (B/A):

print("{0:.3f}".format(
    orders_filtered[np.logical_and(orders_filtered['group']=='B',np.logical_not(orders_filtered['visitorId'].isin(abnormalUsers)))]['revenue'].mean()/
    orders_filtered[np.logical_and(
        orders_filtered['group']=='A',
        np.logical_not(orders_filtered['visitorId'].isin(abnormalUsers)))]['revenue'].mean() - 1)) 
-0.070

We failed to reject the null hypothesis for the filtered data as well and now the difference between the segments has got even less, 2.9 % instead of 3.2%.

13. Making decision based on the test results. (The possible decisions are: 1. Stop the test, consider one of the groups the leader. 2. Stop the test, conclude that there is no difference between the groups. 3. Continue the test.)


In course of work we have discovered the following facts:

- Neither raw nor filtered data revealed any statistically significant differences in conversion between the groups.
- Neither raw nor filtered data revealed any statistically significant differences in average order size between the groups.
- The graph showing the difference in average order size between the groups tells us that group B's results got much better at the beginning of experiment, then after period of fluctuations it stabalised at the same level as the ones of group A. By the end of experiment they were just about 5% lower than group A's (see the graph here [pic 1](#section_1)).


- The graph showing the difference in conversion between the groups tells us that group B's results are worse and don't seem to be improving significantly (see the graph here [pic 2](#section_2)).


Based on these facts, we can conclude that the test was unsuccessful. There difference in key metrics of the groups are not statistically significant. We can see no use in continuing the experiment as the calculations and graphs (either this [pic 1](#section_1) or this [pic 2](#section_2) ) clearly show that the result has stabalaized and the probability that segment B will turn out to be better than segment A is almost nonexistent.

In [ ]: