Sports Stats - A Postgres Database and Python Application
Project Overview: This project leverages object-oriented programming (OOP) principles and data engineering techniques to design and implement a PostgreSQL database for sports statistics. The database schema includes four core tables — seasons
, leagues
, teams
, and standings
. These table relationships are visualized in the Entity Relationship Diagram (ERD) below and can be queried or joined within PostgreSQL for analysis.
More: SportsStats on Github
Implementation Details: The codebase is organized into a models.py script, where classes are categorized as either Data Classes or Table Classes:
Data Classes: These classes handle data retrieval for seasons, leagues, teams, and team standings. Each class contains methods to fetch the relevant data and high-level dictionaries where processed data is stored.
Table Classes: These classes are responsible for building and managing database tables using the SQLAlchemy library.
Challenges Encountered and Future Improvements: One primary challenge was obtaining accurate data, particularly related to the NBA API. Accessing the right headers became difficult, and occasionally led to unavailable data. As a temporary solution, data for the seasons table was manually populated, and is something I would like to fix in the future.
To optimize this codebase, I plan to make the following improvements:
Data Structure Standardization: Refactor dictionaries within the data classes to adopt a uniform nested structure, where each league or team serves as a key.
Codebase Refactoring: Improve consistency in class methods by adhering more strictly to OOP design principles. This includes standardizing function names and parameters, ensuring modularity, and enhancing maintainability.
Reproducibility: Optimize data retrieval and processing workflows to guarantee consistent, accurate, and reproducible results as the database evolves.