GDPR for Engineers - What You Need to Know

GDPR was approved by EU parliament on April 14, 2016, went into effect May 25, 2018, and impacts any business handling any personal data of any EU resident. Many businesses still do not comply with its regulatory requirements however, and for other organizations the implementation remains a complete mystery. The following is an overview of what the law encompasses, and why you should care. If you don’t have a thorough understanding of what personal data means exactly, there’s a post about that here. »

GDPR for Engineers - What is Personal Data?

We all know that GDPR (also known as RGPD in France) has brought data policy into the spotlight for many technical organizations. As of May 25, 2018, if your systems (both automated and otherwise!) handle PII of individuals residing in the EU, you must comply with regulation. While this enforcement date makes the topic seem like old news, many US-based companies are unclear of the specifics and vastly underprepared to deal with the implications. »

Client-side instrumentation for under $1 per month. No servers necessary.

In a world where the importance of data is steadily increasing yet the cost of computing power is steadily decreasing, there are fewer and fewer excuses to not have control of your own data. To explore that point I instrumented this site as inexpensively as I possibly could, without sacrificing reliability or functionality. I have full control of all data that is generated, the instrumentation is highly customizable, the output is simple to use, and I don’t have to be available at all hours to keep it working. »

Built to Scale: Running Highly-Concurrent ETL with Apache Airflow (part 1)

Apache Airflow has seemingly taken the data engineering world by storm. It was originally created and maintained by Airbnb, and has been part of the Apache Foundation for several years now. After heavily leveraging it for about a year (almost 2 million ¬°idempotent! ETL tasks later) and seeing its full potential (but numerous drawbacks), I was tasked with streamlining the deployment and operation of the system. The obvious first step? »

Why Your Company Should Own Its Own Data

When considering software and related infrastructure, the business of today is caught in a never-ending cycle of “build vs. buy”. Many third-party companies solve serious challenges such as managing sales pipelines, accounting automation, payment processing, and internal communication. These alternatives to “building it yourself” empower companies to operate faster or more efficiently, and overall benefit to the customer is often net-positive. When considering various alternatives, there is one critical component of your business that you should strongly reconsider leaving in the hands of third parties, however: your data and supporting data infrastructure. »