Data from the 2018 Central Park Squirrel Census is used for this example. The Squirrel Census is a storytelling project focused on the Eastern gray squirrel and they count squirrels and present their findings.
The table contains information for 3,023 sightings and provides information for the location of the sighting, both the longitutude and latitude, hectare, the timing of the sighting (morning or late afternoon and the date), aspects like age, fur color and current activity etc. Hope you like stories about our furry freinds :)
Source for data description: https://github.com/rfordatascience/tidytuesday/tree/master/data/2019/2019-10-29
Source for data: https://data.cityofnewyork.us/Environment/2018-Central-Park-Squirrel-Census-Squirrel-Data/vfnx-vebw
Before we start going through the data, it is important to look at the number of missing values in the columns.
Since the columns of Highlight Fur Color, Color Notes, Specific Location, Other Activities and Other Interactions have a lot of missing values (>1000), we will drop them. We will also drop Lat/Long since the data is repeated in columns X and Y.
Since the other columns have a very small number of missing values (less than 200), we will retain them.
Its also important to look at the data types of the columns to ensure that all columns are in their expected data types, this will also help to see if there are any errors in data entry.
X float64 Y float64 Unique Squirrel ID object Hectare object Shift object Date int64 Hectare Squirrel Number int64 Age object Primary Fur Color object Combination of Primary and Highlight Color object Location object Above Ground Sighter Measurement object Running bool Chasing bool Climbing bool Eating bool Foraging bool Kuks bool Quaas bool Moans bool Tail flags bool Tail twitches bool Approaches bool Indifferent bool Runs from bool dtype: object
It would be interesting to see the number of unique entries in each column too so as to decide how one wants to use them.
X 3023 Y 3023 Unique Squirrel ID 3018 Hectare 339 Shift 2 Date 11 Hectare Squirrel Number 23 Age 3 Primary Fur Color 3 Combination of Primary and Highlight Color 22 Location 2 Above Ground Sighter Measurement 41 Running 2 Chasing 2 Climbing 2 Eating 2 Foraging 2 Kuks 2 Quaas 2 Moans 2 Tail flags 2 Tail twitches 2 Approaches 2 Indifferent 2 Runs from 2 dtype: int64
To start, we will look at the distribution of squirrel sightings based on the two times of day (shift), that is, morning (AM) or afternoon (PM) and also the three colors, Black, Cinnamon and Gray. The most number of squirrels sightings were of gray squirrels followed by cinnamon and then black. Moreover, a large difference wasnt observed in the sightings between the morning and afternoon times. Guess gray isn't as gloomy as rain clouds make it seem.
Another neat thing that was pointed out in the dataset were the variation in the colors of squirrel fur, in this section we look at the different color variations for the three main colors of Black, Cinnamon and Gray.
Squirrels can make several different vocal sounds, these are Kuks, Quaas and Moans The squirrels were found to make three different sounds, moans, quaas and kuks. Kuks were the most common sound noted with very few squirrels heard moaning.
Source for sounds: Squirrel Alarm Calls Are Surprisingly Complex
Also, in terms of activities, the squirrels were seen to running, climbing and chasing with running and climbing forming the majority of the activities noted with chasing coming in a distant third.
One neat thing to find out would be how friendly the squirrels were in central park. In order to find this out, one can use the columns of Indifferent,Approaches and Runs from which document whether the squirrel was indifferent to human presence, approaching them or ran away. Most of the squirrel interactions had them being indifferent to human presence.Some squirrels ran away whereas a small number approached humans when they saw them.
In order to visualize the location of the sightings, the total area of central park was divided in to a grid of hectares in the dataset. We visualize the number of squirrels found in each cell of the hectare grid.
Another interesting thing to look at is to determine which hectare was dominated by squirrel of which color, this is explored here.
This ends the furry analysis on Central Park squirrels. I will return with some other interesting dataset to look in to :)