BY TEJAS
So, we know that in today's world how much data is important for us. Because of the data, Facebook is showing advertisements to the user.
When we see some data in the form of pictures or graphs then it is very simple to visualize that data. Basically in simple words "Data Visualization is the representation of the data in the form of pictures or graphs".
For data visualization in python, we use "pandas" and "matplotlib". There are so many alternatives for "matplotlib" such as Seaborn, GGplot, Pygal, Bokeh, and so on.
ADVERTISEMENT
Just see this simple program in which we simply show the data of the students on the graph using "Pandas" and "Matplotlib". This program gives you a little bit of knowledge about "What concept exactly data visualization is?"
So, the Coding part of the program is
# importing modules
import pandas as pd
import matplotlib.pyplot as plt
# take student data
studentdata = {
"rollno": [1, 2, 7, 10, 5, 6, 3, 8, 9, 4],
"name": ["Rahul Pawar", "Tejas Magade", "Bharat Kharje", "Pravin Gupta", "Hema Chandra", "Ganesh Rao", "Anil Kumar", "Anant Nag", "Laxmi Prasanna", "Suraj Pawar"],
"class": [11, 12, 12, 9, 12, 11, 10, 10, 9, 11],
"birth": ["14-9-2003", "7-3-2002", "15-9-2002", "20-12-2004", "30-3-2002", "20-4-2003", "5-6-2003", "5-5-2003", "23-6-2004", "25-9-2003"],
"sex": ["M", "M", "M", "M", "F", "M", "M", "M", "F", "M"]
}
# create data frame
df = pd.DataFrame(studentdata)
# extract roll no and name into x and y
x = df['rollno']
y = df['name']
# create bar graph
plt.bar(x, y, label='Student Data', color='green')
# set x and y axis labels
plt.xlabel('Student Roll No')
plt.ylabel('Student Names')
# set school name
plt.title('XYZ School')
# show legend
plt.legend()
# show graph
plt.show()
ADVERTISEMENT
There is one model called "pandas_profiling" which is really a very powerful model in python for analyzing the data it shows all information of data even "how many cells or rows in data?", "how many variables and types of those variables?" and so more information on data shows.
# importing modules
import pandas as pd
import matplotlib.pyplot as plt
import pandas_profiling as pp
# take student data
studentdata = {
"rollno": [1, 2, 7, 10, 5, 6, 3, 8, 9, 4],
"name": ["Rahul Pawar", "Tejas Magade", "Bharat Kharje", "Pravin Gupta", "Hema Chandra", "Ganesh Rao", "Anil Kumar", "Anant Nag", "Laxmi Prasanna", "Suraj Pawar"],
"class": [11, 12, 12, 9, 12, 11, 10, 10, 9, 11],
"birth": ["14-9-2003", "7-3-2002", "15-9-2002", "20-12-2004", "30-3-2002", "20-4-2003", "5-6-2003", "5-5-2003", "23-6-2004", "25-9-2003"],
"sex": ["M", "M", "M", "M", "F", "M", "M", "M", "F", "M"]
}
# create data frame
df = pd.DataFrame(studentdata)
# call ProfileReport object
profile = pp.ProfileReport(df)
# generate into html file
profile.to_file("output.html")
pandas_profiling generates all the information of data in the summarized form to analyze the data.
These are some screenshots of the report provided by pandas_profiling.
Overview Report
Correlation
Interactions
pandas_profiling module gives the full analysis of that data like several variables, the number of observations, cells, and much more in-depth. Just click on the data which you want in-depth.
ADVERTISEMENT
In this tutorial, we are visualizing the data by using queries.
The data which is used in this program is declared inside the program but we can import by using files
like .csv
.
So, this data is about the classroom there are 6 types of data such as rollno
, name
, class
, birth
, sex
, and marks
.
studentdata = {
"rollno": [1, 2, 7, 10, 5, 6, 3, 8, 9, 4],
"name": ["Rahul Pawar", "Tejas Magade", "Bharat Kharje", "Pravin Gupta", "Hema Chandra", "Ganesh Rao", "Anil Kumar", "Anant Nag", "Laxmi Prasanna", "Suraj Pawar"],
"class": [11, 12, 12, 9, 12, 11, 10, 10, 9, 11],
"birth": ["14-9-2003", "7-3-2002", "15-9-2002", "20-12-2004", "30-3-2002", "20-4-2003", "5-6-2003", "5-5-2003", "23-6-2004", "25-9-2003"],
"gender": ["M", "M", "M", "M", "F", "M", "M", "M", "F", "M"],
"marks": [85, 90, 94, 87, 60, 77, 56, 40, 85, 60]
}
ADVERTISEMENT
1. Boys and Girls in the classroom
Boys and Girls in the classroom
First, we importing modules matplotlib and pandas. Declaring the data by using the python dictionary.
Then by using student data we are creating DataFrame by calling DataFrame()
object of
pandas. Then extract the data gender
and name
into x
and
y
.
now we creating the bar graph using plt.bar(x, y, label='Students', color="red")
and by
using plt.show()
we are presenting the data.
# importing modules
import pandas as pd
import matplotlib.pyplot as plt
# take student data
studentdata = {
"rollno": [1, 2, 7, 10, 5, 6, 3, 8, 9, 4],
"name": ["Rahul Pawar", "Tejas Magade", "Bharat Kharje", "Pravin Gupta", "Hema Chandra", "Ganesh Rao", "Anil Kumar", "Anant Nag", "Laxmi Prasanna", "Suraj Pawar"],
"class": [11, 12, 12, 9, 12, 11, 10, 10, 9, 11],
"birth": ["14-9-2003", "7-3-2002", "15-9-2002", "20-12-2004", "30-3-2002", "20-4-2003", "5-6-2003", "5-5-2003", "23-6-2004", "25-9-2003"],
"gender": ["M", "M", "M", "M", "F", "M", "M", "M", "F", "M"],
"marks": [85, 90, 94, 87, 60, 77, 56, 40, 85, 60]
}
# create data frame
df = pd.DataFrame(studentdata)
# extract gender and name into x and y variables
x = df['gender']
y = df['name']
# create bar graph
plt.bar(x, y, label='Students', color="red")
# set x and y axis labels
plt.xlabel('Student gender')
plt.ylabel('Student name')
# set school name
plt.title('XYZ School')
# show legend
plt.legend()
# display the graph
plt.show()
ADVERTISEMENT
2. Employee Id on X axis and Salaries on Y axis
Employee Id on X axis and Salaries on Y axis
First, we are importing the matplotlib module after importing we are taking two data one is employee id
and the second is the salary of an employee.
So, there are two departments in the company hence, we are creating two data one is for the sales
department and the other is for the production department, Data of the sales department is stored in x
and y
variable and Data of the production department is stored in x1
and y1
variable.
After creating and adding data now it's time to creating bar graphs.
For creating bar graphs we are using the plt.bar()
object. After declaring bar graphs now we are
declaring the labels for the graph using plt.xlabel()
and plt.ylabel()
And now we are displaying the graph using plt.show()
.
# importing modules
import matplotlib.pyplot as plt
# take employee id's and salaries for sales department in x and y variables
x = [101, 104, 105, 108, 109, 111]
y = [10000, 12000, 18000, 18500, 13500, 22000]
# take employee id's and salaries for production department in x1 and y1 variables
x1 = [102, 103, 106, 107, 110, 112]
y1 = [9000, 12000, 5000, 8000, 20000, 15000]
# create bar graph
plt.bar(x, y, label='Sales dept', color="blue")
plt.bar(x1, y1, label='Production dept', color="gold")
# set x and y axis labels
plt.xlabel('Employee Ids')
plt.ylabel('Salaries')
# set company name
plt.title('INFOSYS INC')
# show legend
plt.legend()
# display the graph
plt.show()
ADVERTISEMENT
3. Percentage of employees in each department
Percentage of employees in each department
Now we are creating a pie graph which shows the percentage of employees in each department.
# importing modules
import matplotlib.pyplot as plt
# take percentages of employees of 5 departments
slices = [40, 20, 20, 15, 5]
# take department names
depts = ['Sales', 'Production', 'HR', 'Finance', 'Other']
# take colors for each department
cols = ['magenta', 'cyan', 'gold', 'blue', 'red']
# create pie graph
plt.pie(slices, labels=depts, colors=cols, startangle=90, shadow=True, autopct='%.1f%%')
# set company name
plt.title('INFOSYS INC')
# show legend
plt.legend()
# display the graph
plt.show()
ADVERTISEMENT
4. Growth of company by years
Growth of company by years
Now we are creating a line graph which shows the growth of the company by years.
# importing modules
import matplotlib.pyplot as plt
# take years of company
years = ['2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020']
# take profits of company
profits = [5, 15, 20, 10.5, 12.5, 18, 17.7, 15.5, 8.8, 9.75, 10.9]
# create pie graph
plt.plot(years, profits, 'purple')
# set company name
plt.title('INFOSYS INC')
# set labels
plt.xlabel('years')
plt.ylabel('growth')
# display the graph
plt.show()
Likewise, we can visualize the data in python by using matplotlib.
I hope you like this article about data visualization in python. please comment down below and you have any other doubt please comment.
Conclusion:-
In this tutorial, we see the basics of data visualization in Python. we use pandas and matplotlib for that we see some queries which show how to visualize the basic data. How to create different types of graphs using matplotlib.
ADVERTISEMENT