jdeko.me is almost entirely self hosted -
a cloudflare tunnel routes HTTPS traffic from the outside internet to a
raspberry pi 5 single-board-computer in my home office! my nginx docker container
then routes HTTP requests to my nba/wnba stats api container or directly serves static files
back to the client's browser. this raspberry pi has a 4-core CPU & 16GB of RAM, which is more than enough
to handle any traffic this site receives
sourcing nba/wnba stats with go
all data is sourced from
nba.com with the
go-etl program referenced above. the cli element of the program allows
me
to run the etl code in different ways in different scripts. it is used
in "build" mode in
my database build script
(/scripts/bld.sh in the github repo) to
fetch & insert data for every nba & wnba regular season/post season game
since 1970. it's also run in "daily" mode in a cronjob
(runs
/scripts/dly.sh) every day at approximately midnight to fetch &
insert data only for the previous day
/-|-\
the go-etl process takes advantage of go's concurrency features to make
several http requests to nba.com in quick succession. the package then
processes and structures the data to match the postgres database design,
chunks the large volume
of data into small chunks, and inserts the chunks concurrently
/-|-\
this site was originally powered by a MariaDB database with data sourced
from nba.com using the
nba_api
python package. this system worked well, but i wanted to learn more of
the lower-level
concepts of http requests abstracted away by this package, so i decided
to rewrite the entire etl process, with my own http requests, in Go. the
documentation
from the nba_api was incredibly helpful in figuring out this
process
legacy python ETL | py-nba-mdb
storing the stats in postgres
all stats on the site are served from a postgres database server running
in a docker container. the database was designed following
the data normalization principles outlined in
Codd's third normal form
/-|-\
the database is built by a single shell script that builds & runs the docker container,
which is configureed in the Dockerfile & compose.yaml files, executes SQL statements
to create all schemas, tables, procedures, etc., uses the go-etl cli to source & insert
nba/wnba data since 1970, and runs several stored procedures to process & load the
inserted data into their destination tables
/-|-\
the go-etl program inserts data only into the tables in the intake
schema. each table in this schema is designed to match the structure of
the json response from a specific endpoint on nba.com. this keeps the
changes made to the source data minimal before being inserted, which
makes errors less likely and the pipeline more maintainable.
the jdeko.me/bball api primarily interacts with tables in the database's
api schema, which contains tables specifically designed for
quickly accessing aggregated player stats. the data in these tables is
deleted and reaggregated each night after new data is inserted into the
database