Governors of the United States
This blog post delves into my exploratory data analysis (EDA) project focused on U.S. governors. The project aims to uncover insights, such as identifying the governor with the shortest term and understanding the dominance of political affiliations. Beyond these outcomes, the project provides a valuable opportunity for skill development.
The Data
Sourced from Kaggle, the dataset contains information about U.S. governors, excluding territories and specific states from the Thirteen Colonies with a presidential office. With eight columns in CSV format, it covers crucial details like:
StateFull, StateAbbrev, GovernorNumber, GovernorName, TookOffice, LeftOffice, PartyAffiliation, and PartyAbbrevThis publicly available dataset, owned by Brandon Conrady, was last updated on May 12, 2021, and operates under the CC0: Public Domain license.
Some Questions & Answers
- Show governors that share the same seat number.
- What is the earliest inaugural date?
- How many distinct party_affiliations are there?
- Which state has the most governors so far?
The lower the seat number the more governors served.
January 10, 1769 is the earliest inaugural date.
There are 34 distinct affiliations combined. Interestingly, Rhode Island has the most affiliations with 11.
South Carolina has the most governors served with 91 governors.
Tidying the Data
Various tidying steps enhance the dataset, including converting date columns, standardizing naming conventions, and addressing inaccuracies such as correcting a date error in November. This meticulous process involves replacing problematic dates, ensuring consistent formatting, and handling issues during the as.Date conversion.
Dataset:
## # A tibble: 2,587 x 3 ## governor_full_name took_office left_office ## chr chr chr ## 1 William Wyatt Bibb November 9, 1819 July 10, 1820 ## 2 Thomas Bibb July 10, 1820 November 9, 1821 ## 3 Israel Pickens November 9, 1821 November 25, 1825 ## 4 John Murphy November 25, 1825 November 25, 1829 ## 5 Gabriel Moore November 25, 1829 March 3, 1831 ## 6 Samuel Moore March 3, 1831 November 26, 1831 ## 7 John Gayle November 26, 1831 November 21, 1835 ## 8 Clement Comer Clay November 21, 1835 July 17, 1837 ## 9 Hugh McVay July 17, 1837 November 21, 1837 ## 10 Arthur Pendleton Bagby November 21, 1837 November 22, 1841 ## # ... with 2,577 more rows
Code:
#................ convert & save ............................. StateGov_df %>% mutate(took_office = as.Date(took_office, format = "%B %d, %Y"), left_office = as.Date(left_office, format = "%B %d, %Y")) -> StateGov_df #................ view ........ StateGov_df %>% select(governor_full_name, took_office, left_office) %>% head(n = 3)
Result:
## # A tibble: 3 x 3 ## governor_full_name took_office left_office ## chr date date ## 1 William Wyatt Bibb 1819-11-09 1820-07-10 ## 2 Thomas Bibb 1820-07-10 1821-11-09 ## 3 Israel Pickens 1821-11-09 1825-11-25
Revisiting the Imported Dataframe
During the review, challenges related to duplicate governor seat numbers leading to NAs are encountered. Despite warnings and potential syntax complexities, I persist in refining the process to ensure accurate results and avoid undesirable impacts.
Tackling NAs
Identifying 21 values resulting in NAs during the as.Date conversion prompts a deeper investigation. I systematically address this by replacing, converting, and saving the data, prioritizing accuracy and completeness.
Replace NAs
The final step involves replacing problematic dates impacting the took_office and left_office columns. Cross-referencing with NGA.org and employing meticulous error checking ensures a comprehensive and error-free dataset. The result is a refined dataset without any remaining NAs, marking the successful completion of the EDA project.
## # A tibble: 1 x 8 ## state_full_name state_abbrev governor_seat_order governor_full_na~ took_office ## int int int int int ## 1 0 0 0 0 0 ## # ... with 3 more variables: left_office, party_affiliation , ## # party_abbrev