Marcella Harris · DataPortfolio

Cleaned Nashville Hosuing Dataset

June 21, 2022 by Marcella

This project not only aimed to enhance the dataset's performance but also involved a comprehensive process of cleaning, restructuring, organization, and normalization. The dataset, originally sourced from AlexTheAnalyst on GitHub as part of a Data Analyst Portfolio Project, underwent a thorough transformation using SQL queries in SQL Server Management Studio 2018.

Initially organized in Microsoft Excel with 18 general columns and 1 date column, the dataset posed challenges such as columns in all capital letters and month names in title case. With 19 columns and 56,477 rows, the dataset exhibited varied responses in the "SoldAsVacant" column, including No, N, Yes, or Y. Notably, numerous columns contained significant null values, such as OwnerName, OwnerAddress, Acreage, TaxDistrict, LandValue, BuildingValue, TotalValue, YearBuilt, Bedrooms, FullBath, and HalfBath.

During the transition to SQL Server, despite the absence of recorded initial data types, total nulls, and distinct values, the dataset underwent a holistic transformation through multiple queries, each categorized for specific tasks. The result is a cleaned dataset featuring 21 columns, 56,477 rows (excluding the header), and a total of 26 query statements. This meticulous approach aimed not only at optimizing performance but also at ensuring the dataset's cleanliness, structure, organization, and normalization for improved analytical insights.

Cleaned Nashville Hosuing Dataset

Resources used

Resource post title

This is another resource post title

Longer resource post title: This one has multiple lines!

Elsewhere