All Posts

2016

Client batching affect on using processing-time for aggregations

Analyzing metrics from thousands or millions of clients typically requires aggregations for downstream analysis.

Pragmatic Guide: Apache Kafka or AWS Kinesis

Building a unified data pipeline means that you will likely need to choose between two of the mainstream messaging systems:

2015

Python default values and bad side-effects

The extendList function defaults myList to an empty list. However, myList is only created when the function is first defined. Therefore subsequent calls to e...

Python concepts for interviews

Preparing for a technical interview with Python means that you should have a decent understanding of the following concepts.

Python closures and late binding

An example of a closure is when a function depends on a variable outside it’s scope. A more specific definition from Stack Overflow Post states:

Programming phone screen questions

How do you scale a write/read-heavy application?

Data pipeline and stream processing

A data pipeline is method for shipping data efficiently to various services throughout your system. It also provides a framework that supports stream process...

Python decorate pattern explained

A decorate pattern is simply a wrapper that is used to extend the behavior of a function without actually modifying the function.

Java singleton pattern

The singleton pattern ensures that only one instance of an object can be instantiated. The code snippet (from tutorialspoint) below is an example of a single...

C++ virtual functions

Virtual methods allow the subclass methods to be called even if the pointer is of type base class. The code snippet is taken from Stack Overflow:

Launching apache spark using supervisord

Apache Spark comes bundled with several launch scripts to start the master and worker processes. We can launch a master and two workers processes by executing

Integrating spark streaming and aws redshift

The most efficient way to load data into AWS Redshift is first upload your data to S3 and then execute the copy command on Redshift. From the documentation

Spark streaming and aws kinesis pitfalls

What you need to know when setting up Spark Streaming with AWS Kinesis.

Building Apache Spark using sbt

How to avoid dependency hell

Mike Trienis

All Posts

2016

2015