Python works quite well for data analysis, you have a lot of libraries which make analyzis much simpler, Pandas is one of them which basically consist in managing DataFrames for manipulating data.
Filtering data in a pandas dataframe consist in selecting rows based on a criteria. There are many ways to filter data, we are going to start to check two of them in the following article.
If you came from using Sql this would be the equivalent of using “Where” clause when filtering rows from a table.
For this example we are going to use “cop shooting dataset” that can be downloaded from the following url https://www.kaggle.com/andrewmvd/police-deadly-force-usage-us.
First we need to import pandas library
import pandas as pd
Then we are going to read file that we loaded. Change the path to math with our file Directory.
police_shootings = pd.read_csv("H:\Datasets\datasets_686757_1204345_fatal-police-shootings-data.csv")
As a result of that now we have a dataFrame called “police_shootings”.
We are going to explore the current DataFrame and obtain the top 5 values that we have on it.
police_shootings.head(n = 5)
Then we are going to drop some dataframe columns in order to make this database more digestible.
police_shootings.drop('signs_of_mental_illness', axis=1, inplace = True) police_shootings.drop('threat_level', axis=1, inplace = True) police_shootings.drop('flee', axis=1, inplace = True)
Final result would be a DataFrame witth 11 columns and 5338 rows.
2. Subsetting rows by conditions.
One of the simplest techniques for finding a dataset is to find a subset of rows that match some criteria.
The mosty common way to do it is by using relational operators that return “True” or “False” for each row and that can be done by passing that insite brackets.
3.1 Filtering rows by a condition
For example if we want to know how many shootings occured to people under 18 years old.
As you can see there were 98 deadly shootings with police from 2015 to 2020 that involved people with under 10 years old.
Also if we want to know the shootings that occurred starting from 2019-01-01.
3.2 Filtering rows by multiple conditions
Also we can use two or more conditions in order to subset. For this we are going to use “age” and “armed” columns.
First we are going to get unique values of “armed” column in order to filter one of those values.
Second we are going to filter shootings were suspects under 18 years and toy weapons were involved.
police_shootings[(police_shootings["age"]<18) & (police_shootings["armed"]=="toy weapon")]
Do not forget to separate with parenthesis () every condition.
If we want to add another condition to the filtering we should and and “&” followed by the condition.
In this case we would like to add another condition, related with the gender in the shootings.
police_shootings[(police_shootings["age"]<18) & (police_shootings["armed"]=="toy weapon") & (police_shootings["gender"]=="F")]
3.3 Filtering rows using query function
There is a function called query that enables you to do the same filtering on dataframes in python.
police_shootings.query('age <18 and armed== "toy weapon" and gender=="F"')
But this function syntax is easier and has more options than the previous one.
For example you can use inplace which modify the data in place “True” or return a modified copy “False”. Default behavior is “False”
police_shootings.query('age <18 and armed== "toy weapon" and gender=="F"' ,inplace=True)
For example after applying this code, dataframe is going to be changed and you can display this by displaying the dataframe.