According to WHO, 7 million people die every year from exposure to fine particles in polluted air that lead to diseases such as stroke, heart disease, lung cancer, chronic obstructive pulmonary diseases, and respiratory infections, including pneumonia.
91% of the world’s population live in places where air quality exceeds WHO guideline limits.
Here we analyze global air quality data from openaq:
Avro is an open-source language-agnostic data serialization framework. The schema of Avro files is specified in JSON format, making it easy to read and interpret. Files that store Avro data should always also include the schema for that data in the same file.
Avro includes APIs for C, C++, C#, Java, JS, Perl, PHP, Python, and Ruby. Being language agnostic, files stored using Avro can be passed between programs written in different languages.
You can find the source code for this tutorial here: https://github.com/ksree/apache-avro-demistified
An Avro file consists of a file header, followed by one or more file data blocks.
…
We have all read about and experienced the effects of climate change every day around us. We have seen numbers like: The current global average temperature is 0.85ºC higher than it was in the late 19th century, and each of the past three decades has been warmer than any preceding decade since records began in 1850*.
I got curious about how climatologists determine these numbers. There is a whole lot of research going on in this area. I came across one important weather dataset from NOAA that is widely used in research.
In this blog, I will explain how I…
Five years back when I started working on enterprise big data platforms, the prevalent data lake architecture was to go with a single public cloud provider or on-prem platform. Quickly these data lakes grew into several terabytes to petabytes of structured and unstructured data(only 1% of unstructured data is analyzed or used at all). On-prem data lakes hit capacity issues, while single cloud implementations risked so-called vendor lockin.
Today, Hybrid Multi-Cloud architectures that use two or more public cloud providers are the preferred strategy. 81% of public cloud users reported using two or more cloud providers.