Zokkia

Entradas

Building a Data Lake: Step by Step

octubre 08, 2020

Building a Data Lake: Step by Step Published on: October 29, 2019 Linda Feng , Software Architect At Unicon, a common challenge that we hear about is this: “We have lots of data everywhere, but we want a way to bring the data into one place so that we can analyze it and use it to ‘see’ how we are doing, possibly to spot trends and ultimately to inform decision making.” Many universities and school districts today are in various stages of implementing systems to enable data collection for useful analytics. A common problem is that they want to collect different types of data and combine them in meaningful ways. And while there are many more factors that contribute to the success of transformative uses of data on campuses, one hurdle that IT administrators face is how to get their “data house” in order. During the last few years, I’ve spent most of my time helping customers assemble a variety of data sources into a data lake. What we have seen that works best is to first think throu...

Building Serverless Data Lake Pipeline on AWS

octubre 07, 2020

Building Serverless Data Lake Pipeline on AWS Today I’d like to talk about building serverless data lake on AWS. The reason of writing this post is to share my thinking with the world, to get feedback about my prototype, vision and, at the same time, to share experiences that may be of interest to data engineer practitioners, and other people. General Data Lake Pipeline What I’d like to do is to start with what a modern data lake pipeline looks like on AWS. Data lake pipeline Generate The first thing is generation, generating data sources. The typical ways to generate data sources in traditional application is done by transaction legacy system, ERP system, web logs, more and more like capturing information about consumers actually hitting the website, sensor networks feeding data into data pipeline. Collection The next part is collection side and you might see services like polling services running on EC2, going out to enterprise system to poll data from file systems or databases. Mode...

Buscar este blog

Zokkia

Entradas

What is Data Lake?

Building a Data Lake: Step by Step

Building Serverless Data Lake Pipeline on AWS