Client-side instrumentation for under $1 per month. No servers necessary.

In a world where the importance of data is steadily increasing yet the cost of computing power is steadily decreasing, there are fewer and fewer excuses to not have control of your own data. To explore that point I instrumented this site as inexpensively as I possibly could, without sacrificing reliability or functionality. I have full control of all data that is generated, the instrumentation is highly customizable, the output is simple to use, and I don’t have to be available at all hours to keep it working. »

Built to Scale: Running Highly-Concurrent ETL with Apache Airflow (part 1)

Apache Airflow has seemingly taken the data engineering world by storm. It was originally created and maintained by Airbnb, and has been part of the Apache Foundation for several years now. After heavily leveraging it for about a year (almost 2 million ¬°idempotent! ETL tasks later) and seeing its full potential (but numerous drawbacks), I was tasked with streamlining the deployment and operation of the system. The obvious first step? »

Why Your Company Should Own Its Own Data

When considering software and related infrastructure, the business of today is caught in a never-ending cycle of “build vs. buy”. Many third-party companies solve serious challenges such as managing sales pipelines, accounting automation, payment processing, and internal communication. These alternatives to “building it yourself” empower companies to operate faster or more efficiently, and overall benefit to the customer is often net-positive. When considering various alternatives, there is one critical component of your business that you should strongly reconsider leaving in the hands of third parties, however: your data and supporting data infrastructure. »