Business goal: analyze trending-video history on YouTube
How often the dashboard will be used: at least once a day
Target dashboard user: video ads planning managers
Dashboard data content:
Parameters according to which the data is to be grouped:
The nature of the data:
Importance: the trending-video history graph is the most important, so it should take up at least half of the dashboard
Data sources for the dashboard: raw data on trending (the trending_vids table)
Database storing aggregate data: additional aggregate tables in the youtube database
Data update interval: once every 24 hours, at midnight UTC;
Graphs to be displayed and their order, dashboard controls to be included
import pandas as pd
from IPython.display import display
from datetime import datetime
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sqlalchemy import create_engine
#connecting to db
db_config = {'user': 'practicum_student', # user name
'pwd': 's65BlTKV3faNIGhmvJVzOqhs', # password
'host': 'rc1b-wcoijxj3yxfsf3fs.mdb.yandexcloud.net',
'port': 6432, # connection port
'db': 'data-analyst-youtube-data'} # the name of the database
connection_string = 'postgresql://{}:{}@{}:{}/{}'.format(db_config['user'], db_config['pwd'], db_config['host'], db_config['port'], db_config['db'])
engine = create_engine(connection_string)
query = '''
SELECT *
FROM trending_by_time
'''
pd.io.sql.read_sql(query, con = engine)
trending_by_time = pd.io.sql.read_sql(query, con = engine, index_col = 'record_id')
trending_by_time.to_csv('trending_by_time.csv', index = False)
trending_by_time.head(5)
trending_by_time.info(memory_usage='deep')
pivoit_countries_category = trending_by_time.pivot_table(index='category_title',columns='region',values='videos_count',aggfunc='sum').reset_index().fillna(0)
pivoit_countries_category