A Begineer's Guide to Data Visualization in Python

A Begineer's Guide to Data Visualization in Python

#data-visualization #python #datavisualizationwithpython #datavisualizationinpython

Avatar

 BY TEJAS





So, we know that in today's world how much data is important for us. Because of the data, Facebook is showing advertisements to the user.

Introduction

When we see some data in the form of pictures or graphs then it is very simple to visualize that data. Basically in simple words "Data Visualization is the representation of the data in the form of pictures or graphs".




For data visualization in python, we use "pandas" and "matplotlib". There are so many alternatives for "matplotlib" such as Seaborn, GGplot, Pygal, Bokeh, and so on.




ADVERTISEMENT




Simple Program


Just see this simple program in which we simply show the data of the students on the graph using "Pandas" and "Matplotlib". This program gives you a little bit of knowledge about "What concept exactly data visualization is?"

data-visualization

So, the Coding part of the program is

                
    # importing modules
    import pandas as pd
    import matplotlib.pyplot as plt

    # take student data
    studentdata = {
        "rollno": [1, 2, 7, 10, 5, 6, 3, 8, 9, 4],
        "name": ["Rahul Pawar", "Tejas Magade", "Bharat Kharje", "Pravin Gupta", "Hema Chandra", "Ganesh Rao", "Anil Kumar", "Anant Nag", "Laxmi Prasanna", "Suraj Pawar"],
        "class": [11, 12, 12, 9, 12, 11, 10, 10, 9, 11],
        "birth": ["14-9-2003", "7-3-2002", "15-9-2002", "20-12-2004", "30-3-2002", "20-4-2003", "5-6-2003", "5-5-2003", "23-6-2004", "25-9-2003"],
        "sex": ["M", "M", "M", "M", "F", "M", "M", "M", "F", "M"]
    }

    # create data frame
    df = pd.DataFrame(studentdata)

    # extract roll no and name into x and y
    x = df['rollno']
    y = df['name']

    # create bar graph
    plt.bar(x, y, label='Student Data', color='green')

    # set x and y axis labels
    plt.xlabel('Student Roll No')
    plt.ylabel('Student Names')

    # set school name
    plt.title('XYZ School')

    # show legend
    plt.legend()

    # show graph
    plt.show()

                
                


ADVERTISEMENT



Pandas_Profiling


There is one model called "pandas_profiling" which is really a very powerful model in python for analyzing the data it shows all information of data even "how many cells or rows in data?", "how many variables and types of those variables?" and so more information on data shows.

                       
    # importing modules
    import pandas as pd
    import matplotlib.pyplot as plt
    import pandas_profiling as pp
                    
    # take student data
    studentdata = {
        "rollno": [1, 2, 7, 10, 5, 6, 3, 8, 9, 4],
        "name": ["Rahul Pawar", "Tejas Magade", "Bharat Kharje", "Pravin Gupta", "Hema Chandra", "Ganesh Rao", "Anil Kumar", "Anant Nag", "Laxmi Prasanna", "Suraj Pawar"],
        "class": [11, 12, 12, 9, 12, 11, 10, 10, 9, 11],
        "birth": ["14-9-2003", "7-3-2002", "15-9-2002", "20-12-2004", "30-3-2002", "20-4-2003", "5-6-2003", "5-5-2003", "23-6-2004", "25-9-2003"],
        "sex": ["M", "M", "M", "M", "F", "M", "M", "M", "F", "M"]
    }
                    
    # create data frame
    df = pd.DataFrame(studentdata)
                    
    # call ProfileReport object
    profile = pp.ProfileReport(df)

    # generate into html file
    profile.to_file("output.html")
                    
                       
                   

pandas_profiling generates all the information of data in the summarized form to analyze the data.
These are some screenshots of the report provided by pandas_profiling.


pandas_profiling

Overview Report


pandas_profiling

Correlation


pandas_profiling

Interactions


pandas_profiling module gives the full analysis of that data like several variables, the number of observations, cells, and much more in-depth. Just click on the data which you want in-depth.



ADVERTISEMENT



Visualization of Data


In this tutorial, we are visualizing the data by using queries.

The data which is used in this program is declared inside the program but we can import by using files like .csv.
So, this data is about the classroom there are 6 types of data such as rollno , name, class, birth, sex, and marks.

                     
    studentdata = {
            "rollno": [1, 2, 7, 10, 5, 6, 3, 8, 9, 4],
            "name": ["Rahul Pawar", "Tejas Magade", "Bharat Kharje", "Pravin Gupta", "Hema Chandra", "Ganesh Rao", "Anil Kumar", "Anant Nag", "Laxmi Prasanna", "Suraj Pawar"],
            "class": [11, 12, 12, 9, 12, 11, 10, 10, 9, 11],
            "birth": ["14-9-2003", "7-3-2002", "15-9-2002", "20-12-2004", "30-3-2002", "20-4-2003", "5-6-2003", "5-5-2003", "23-6-2004", "25-9-2003"],
            "gender": ["M", "M", "M", "M", "F", "M", "M", "M", "F", "M"],
            "marks": [85, 90, 94, 87, 60, 77, 56, 40, 85, 60]
                  }
                     
                 

ADVERTISEMENT




1. Boys and Girls in the classroom


Boys and Girls in the classroom

Boys and Girls in the classroom


First, we importing modules matplotlib and pandas. Declaring the data by using the python dictionary.
Then by using student data we are creating DataFrame by calling DataFrame() object of pandas. Then extract the data gender and name into x and y.
now we creating the bar graph using plt.bar(x, y, label='Students', color="red") and by using plt.show() we are presenting the data.

                    
    # importing modules
    import pandas as pd
    import matplotlib.pyplot as plt

    # take student data
    studentdata = {
        "rollno": [1, 2, 7, 10, 5, 6, 3, 8, 9, 4],
        "name": ["Rahul Pawar", "Tejas Magade", "Bharat Kharje", "Pravin Gupta", "Hema Chandra", "Ganesh Rao", "Anil Kumar", "Anant Nag", "Laxmi Prasanna", "Suraj Pawar"],
        "class": [11, 12, 12, 9, 12, 11, 10, 10, 9, 11],
        "birth": ["14-9-2003", "7-3-2002", "15-9-2002", "20-12-2004", "30-3-2002", "20-4-2003", "5-6-2003", "5-5-2003", "23-6-2004", "25-9-2003"],
        "gender": ["M", "M", "M", "M", "F", "M", "M", "M", "F", "M"],
        "marks": [85, 90, 94, 87, 60, 77, 56, 40, 85, 60]
    }

    # create data frame
    df = pd.DataFrame(studentdata)

    # extract gender and name into x and y variables
    x = df['gender']
    y = df['name']

    # create bar graph
    plt.bar(x, y, label='Students', color="red")

    # set x and y axis labels
    plt.xlabel('Student gender')
    plt.ylabel('Student name')

    # set school name
    plt.title('XYZ School')

    # show legend
    plt.legend()

    # display the graph
    plt.show()

                    
                

ADVERTISEMENT




2. Employee Id on X axis and Salaries on Y axis


Employee Id on X axis and Salaries on Y axis

Employee Id on X axis and Salaries on Y axis


First, we are importing the matplotlib module after importing we are taking two data one is employee id and the second is the salary of an employee.
So, there are two departments in the company hence, we are creating two data one is for the sales department and the other is for the production department, Data of the sales department is stored in x and y variable and Data of the production department is stored in x1 and y1 variable.
After creating and adding data now it's time to creating bar graphs. For creating bar graphs we are using the plt.bar() object. After declaring bar graphs now we are declaring the labels for the graph using plt.xlabel() and plt.ylabel() And now we are displaying the graph using plt.show().

                  
   # importing modules
   import matplotlib.pyplot as plt
                     
   # take employee id's and salaries for sales department in x and y variables
   x = [101, 104, 105, 108, 109, 111]
   y = [10000, 12000, 18000, 18500, 13500, 22000]

   # take employee id's and salaries for production department in x1 and y1 variables
   x1 = [102, 103, 106, 107, 110, 112]
   y1 = [9000, 12000, 5000, 8000, 20000, 15000]

   # create bar graph
   plt.bar(x, y, label='Sales dept', color="blue")
   plt.bar(x1, y1, label='Production dept', color="gold")

   # set x and y axis labels
   plt.xlabel('Employee Ids')
   plt.ylabel('Salaries')

   # set company name
   plt.title('INFOSYS INC')

   # show legend
   plt.legend()

   # display the graph
   plt.show()
                  
               

ADVERTISEMENT




3. Percentage of employees in each department


Percentage of employees in each department

Percentage of employees in each department


Now we are creating a pie graph which shows the percentage of employees in each department.

                  
   # importing modules
   import matplotlib.pyplot as plt

   # take percentages of employees of 5 departments
   slices = [40, 20, 20, 15, 5]

   # take department names
   depts = ['Sales', 'Production', 'HR', 'Finance', 'Other']

   # take colors for each department
   cols = ['magenta', 'cyan', 'gold', 'blue', 'red']

   # create pie graph
   plt.pie(slices, labels=depts, colors=cols, startangle=90,  shadow=True, autopct='%.1f%%')

   # set company name
   plt.title('INFOSYS INC')

   # show legend
   plt.legend()

   # display the graph
   plt.show()
                  
               

ADVERTISEMENT




4. Growth of company by years


Growth of company by years

Growth of company by years


Now we are creating a line graph which shows the growth of the company by years.

                  
   # importing modules
   import matplotlib.pyplot as plt
                     
   # take years of company
   years = ['2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020']
                     
   # take profits of company
   profits = [5, 15, 20, 10.5, 12.5, 18, 17.7, 15.5, 8.8, 9.75, 10.9]
                     
   # create pie graph
   plt.plot(years, profits, 'purple')
                     
   # set company name
   plt.title('INFOSYS INC')
                     
   # set labels
   plt.xlabel('years')
   plt.ylabel('growth')
                     
   # display the graph
   plt.show()                     
                  
               

Likewise, we can visualize the data in python by using matplotlib.



And we're done here


I hope you like this article about data visualization in python. please comment down below and you have any other doubt please comment.

Conclusion:-
In this tutorial, we see the basics of data visualization in Python. we use pandas and matplotlib for that we see some queries which show how to visualize the basic data. How to create different types of graphs using matplotlib.


ADVERTISEMENT




Search
Advertisement
Share With
Follow us on
Advertisement
Advertisement