===================
== Nathan's Blog ==
===================
infrequent posts about things I am working on

Drawing of man looking though optic in early submarine

Welcome to my blog! I post infrequently about things, mostly related to computers and software, I am working on.

The blog is hosted on Github pages and built using Hugo using the smol theme.

Upload a Pandas Dataframe to AWS S3 With Ease

Uploading a Pandas dataframe to S3 is different from writing the dataframe to a local filesystem. But have no fear! It is easy once you understand a couple of key concepts. Here is a working example using boto3.resource("s3") that has been tested against pandas 1.3.2. It is worth noting that the following will only work with pandas versions greater than 1.2.0. from io import BytesIO import boto3 import pandas from pandas import util df = util. Read more...

Moving My Personal Blog From Wordpress to Hugo

hugo website hosting
I started the process of moving my personal blog over a year ago, when the global pandemic brought on by the COVID-19 virus sent my local area into lock down. The reasons for doing so were simple enough, my web hosting bill had grown north of $200 dollars a month for my personal blog. Don’t get me wrong, Wordpress is great! I just wanted to get my blog onto something more appropriate for the audience (read, pretty small). Read more...

Dynamically Set ORM Schemas via Sqlalchemy

data databases sqlalchemy orm
Sometimes the solution to a problem is so obvious, it takes a while to figure it out. I recently stumbled on such a problem when trying to configure a set of Object Relational Mappings (ORM) to support an application with the same set of table objects across different schemas in Postgres. Developing an ORM to support this pattern, a multi-tenant database model, proved challenging because of where I started. Below, I will detail the correct way to support the multi-tenant pattern as well as various approaches I came across and why they should not be used. Read more...

A Primer on Data Normalization

data databases EF Codd normalization
Normalizing data is a common data engineering task. It prepares information to be stored in a way that minimizes duplication and is digestible by machines. It also aims to solve other problems and issues that are out of scope for this particular article but worth reading about if you find yourself struggling to understand jokes about E. F. Codd. This begs the question, why does normalization matter when entering information in a table or organizing a spreadsheet? Read more...

Deals, Deals, Deals

Wondering whether your favorite tools, services, or products are one sale this week? Below is a list of Cyber Week deals to help you get started with Data Engineering, refresh your toolbox, or launch your side project. Feel free to add to the list over on Github.

Let Pycharm Use WSL’s Git Executable

This post is mostly for me but I ran into a ton of conflicting information while troubleshooting my Windows Subsystem for Linux (WSL) and PyCharm integration and figured it may help someone else. First things first. Versions matter! Before wasting your time trying to get Pycharm and WSL to play nicely, make sure you are running PyCharm2020.2 or greater and WSL 2. If you a) have no idea what those versions mean or b) are not sure what version you are using, allow me a chance to explain. Read more...

Speed Up Your REST Workflows with asyncio

API concurrent python REST
I have been waiting for a project that would allow me to dig into the Python’s asyncio library. Recently, such a project presented itself. I was tasked with hitting a rate limited REST API with just under 4 million requests. My first attempt was simple. Gather and build a block of search queries, POST each one to the API, process the results, and finally insert them in a database. Here is what the code looked like: Read more...

How to Get the First N Bytes of a File

big files bytes linux powershell tutorial wc
There comes a time when you just need to take a little off the top of a file, see what you are working with. That is where knowing how to use a utility like <a href="http://man7.org/linux/man-pages/man1/head.1.html">head</a> can help. Just running: Will get you http://man7.org/linux/man-pages/man1/head.1.htmlBut what if that file does not have nice lines? Large SQL dump files come to mind. head has an answer. Use the -c flag to print the beginning bytes of a file instead of lines. Read more...

Search for a String in a list of Encrypted Values

Imagine a scenario where one party wants to check whether a name they have exists in a list of names kept by the another party. But I do not want the other party to know what name I am searching. This problem may seem unrealistic but imagine a data breach where tons of personal information is leaked. You want to check whether you were impacted in the breach but do not trust the party hosting the personal information to keep your query safe. Read more...

Your Simple Guide to Collecting Oral History

audio family history FLAC formats getting started interviews ocen audio ocenaudio oral history process WAV
Collecting memories from people is an excellent way to celebrate the experience of others. I have found it helps me learn more about why people hold certain beliefs, how they overcame hardships, and the world we live in. Interviewing other people has helped me learn more about myself, which is why I wanted to write up a guide for collecting the stories of other people. The most obvious aspect of collecting stories is interviewing. Read more...