In this post I will go through a sample of how to analyze data in Apache Spark. The data set is a sample form Instagram feeds that I talked about it in my previous post. This post will be 2 parts. The first is preparing the data, and the second is answering questions.
In this post i will just explain an Instagram data-set that i collect during one month ( April, 2015 ). I use InstagramCSharp library for creating 10 Instagram Real-Time geographic subscriptions and collecting data in cloud jobs. The subscriptions were for 10 cities with random longitude and latitude from those in them with 5000 meters radius.