vulab | Category Archive | Big Data

Archive | Big Data

Posted on 29 April 2017 by Srinivas Nelakuditi

Twitter Feed to Apache Hive using Apache Flume

Step 1: CREATE Folder in HDFS to load Tweet Data from Twitter Create a folder in HDFS hadoop fs -mkdir /demo/tweets Step 2: Configure Flume by creating a Flume Configuration file with source, sink and channel NOTE: Please get your own credentials for twitter by registering at twitter.com as a developer. Replace the xxxxxxxxxxx in […]

Continue Reading

0 Comments

Posted on 07 March 2017 by Srinivas Nelakuditi

Oozie Job Scheduling for Hive using Coordinator and Workflow

Let us create a Coordinator Job for Oozie. The job will run a Hive script at every 5 minute interval. Create a file job.properties nameNode=hdfs://ip-10-74-66-159.vulab.com:8020 jobTracker=ip-10-74-66-190.vulab.com:8050 userName=srinivas script_name_external = ${nameNode}/user/${userName}/hive_scripts/external.hive database=dev_stg oozie.use.system.libpath=true   Create a file workflow.xml <workflow-app xmlns = “uri:oozie:workflow:0.4” name = “simple-Workflow”> <start to = “Create_External_Table” /> <action name = “Create_External_Table”> <hive xmlns […]

Continue Reading

0 Comments

Posted on 03 February 2015 by Srinivas Nelakuditi

Apache Spark Introduction from Vulab Hadoop Novice to Professional Training

Why Apache Spark? To understand why Apache Spark, let us listen to the story of YouTube. According to You Tube, as of Dec 1st 2014. 300 hours of video is getting uploaded every minute. If YOUTUBE is built with hardware and software from Oracle compared with OpenSource Big Data Solutions: users will upload about 300 hours of […]

Continue Reading

0 Comments

Posted on 06 December 2014 by Srinivas Nelakuditi

Apache Hadoop 2.6 is Available Now.

Congrats to Apache Hadoop Community on releasing the Hadoop 2.6 version. Yarn storage features have been upgraded in this version. See the Hadoop 2.6.0 Release Notes for details. Try it out and let us know if you find any bugs or interesting enhancements.

Continue Reading

0 Comments

Posted on 18 October 2014 by Srinivas Nelakuditi

Install Apache Storm for Development

Step 1: Boot up your Ubuntu virtual machine using Oracle Virtual Box Step 2: Install JDK 7 or JDK 8 and set JAVA_HOME Step 3: Install Maven2 Step 4: Download Storm use tar -xvzf apache-storm-0.9.3-rc1.tar.gz Your environment is now ready for running your storm projects. In the next blog we will to code for storm […]

Continue Reading

0 Comments

Posted on 12 October 2014 by Srinivas Nelakuditi

Guarantees from Apache Storm

Apache Storm provides guarantees and new possibilities which could not achieved with Hadoop and other batch oriented data processing engines. Apache Strom provides real-time computation and facilitates real-time feedback for any data. Storm helps in parallel real-time processing of data. Key Guarantees of Storm: Broad Set of Use Cases: Apache Storm can process streams of […]

Continue Reading

0 Comments

Posted on 12 October 2014 by Srinivas Nelakuditi

Apache Storm Introduction

Apache Storm is a massive data crunching distributed real-time computation engine. Apache Storm can be defined as hadoop for realtime. Apache Storm processes data in real time where as Hadoop processes data in batch oriented fashion. Apache Storm is Open Source project. We at Vulab use Storm in our large fortune 500 projects and clients […]

Continue Reading

0 Comments

Posted on 11 October 2014 by Srinivas Nelakuditi

Apache Kafka Tutorial Series

Apache Kafka Tutorials  Apache Kafka is the industry leading open source distributed messaging platform. Kafka provides the following: Persistent messaging to hard-disk with O(1) disk structures that provide constant time performance even with many TB of stored messages. High-throughput: Just with simple modest hardware Kafka can support hundreds of thousands of messages per second. Explicit […]

Continue Reading

0 Comments

Posted on 11 October 2014 by Srinivas Nelakuditi

Apache Kafka Message Consumer API Tutorial

In this hands-on tutorial we will use Apache Kafka API to consume or read messages from Apache Kafka topics. Please read the below two blogs before attempting this tutorial. Apache Kafka Installation Apache Kafka Message Producer API Tutorial Step 1: Create a Kafka Consumer using API Step 2: Set Eclipse Runtime Parameters for Consumer

Continue Reading

0 Comments

Posted on 11 October 2014 by Srinivas Nelakuditi

Apache Kafka Message Producer API Tutorial

In this tutorial we will learn using Apache Kafka High Level Producer API to send messages using Java Program. Please make sure you have completed Apache Kafka Installation tutorial before you start this tutorial. Step 1: Install JDK Make sure you have JDK 1.7 or JDK 1.8 installed on your machine. You can download and […]

Continue Reading

1 Comment