Gaining insights into the Chapel Hill crime dataset
By: CarolinaDataDragons
Aan Patel, Linglu Xu, Peiyuan Ma, Yue Feng
1. Insights from mapping crime data
The crimes by area tab shows that the area with zip code 27514, located in the central part of Chapel Hill, has the largest number of criminal incident reports. This is because the central part of Chapel Hill will certainly have a higher population density than the outer parts, which could lead to higher crime rates in that area.
​
From the armed crime map tab, we can infer that the number of armed crime incidents is minuscule compared to the count of incidents marked null in the dataset. However, amongst the armed crime records, the use of handguns is most prominent, followed by rifles and finally shotguns. At (35.9340, -79.0348), a corner surrounded by forests, has the most armed cases. More than half of the cases are clustered around the long E Franklin street.
​
From the top 5 offences tab, we can infer that the distribution of offence types is quite cluttered and that clustering methods like K-means and K Nearest Neighbors will not be effective.
2. Insights from sorting crime data
Prisons, rental and commercial storage facilities, liquor stores, places near lakes and other water bodies, and jewelry stores generally ranked as the safest places with reported cases, while streets, residential areas, and parking lots were dangerous places to stay.
​
The distribution in the top 5 offenses tab tends to be similar to the armed crime map tab, but MLK Jr Blvd shows a significant number of “information” type offenses relative to other types of offense.
3. Gender, victims and arrests
The arrests by gender over years tab shows that the general trend of total arrests is decreasing for both males and females. Also, the number of male suspects is always greater than the number of female suspects. The victims by gender over years tab shows that there have always been more females victimized than males.
4. Insights into types of offences
From the data related to the type of offenses, for the unarmed cases, “trespassing”, “assist ems”, “domestic disturbance” and “information” cases have the highest number, while for the armed cases, the handgun is always the most common weapon used by the suspects. Among armed suspects, vandalism is the highest crime.
5. How age affects "when" people commit crimes
The chart below shows that young people are more likely to commit crimes in the early hours of the morning around late hours of the night, and the average age of a criminal woman comes out to be less than that of a man.
Note: The date shown on hovering is because the time strings from the dataset were converted into a “datetime” object, which forced the time values in every row to be the same date. However, we are concerned with the time of the day for each record here, so the date is not of our concern.
6. Looking for improvements with time
Finally, by plotting a regression line over the number of reported incidents involving guns by year plot, we found that the number of such incidents are showing a declining trend (negative slope on the regression line), which might be a good sign and indicates that our community is becoming safer as time goes by.
Predictions
We analysed the relationship between armed cases and time and used linear regression to analyze the data to make some predictions:
Arrests by gender over years
Male:
Number of Records = -0.168877*Year of Date Of Arrest + 8274.45
Female:
Number of Records = -0.0273435*Year of Date Of Arrest + 1439.97
Victims by gender over years
Male:
Number of Records = -22.5818*Year of date_of_occurrence + 47284.1
Female:
Number of Records = -38.4364*Year of date_of_occurrence + 79488.5
Average age vs time of arrest
Male:
Avg. Arrestees Age = 9.28471*Hour of Time Of Arrest + 12.7136
Female:
Avg. Arrestees Age = 7.53482*Hour of Time Of Arrest + 14.4444
Reported incidents with weapons
Count of weapon_description = -0.000503599*Month of date_of_occurrence + 26.5486