vulab | Category Archive | Apache Hadoop

Archive | Apache Hadoop

Posted on 29 April 2017 by Srinivas Nelakuditi

Twitter Feed to Apache Hive using Apache Flume

Step 1: CREATE Folder in HDFS to load Tweet Data from Twitter Create a folder in HDFS hadoop fs -mkdir /demo/tweets Step 2: Configure Flume by creating a Flume Configuration file with source, sink and channel NOTE: Please get your own credentials for twitter by registering at twitter.com as a developer. Replace the xxxxxxxxxxx in […]

Continue Reading

0 Comments

Posted on 07 March 2017 by Srinivas Nelakuditi

Oozie Job Scheduling for Hive using Coordinator and Workflow

Let us create a Coordinator Job for Oozie. The job will run a Hive script at every 5 minute interval. Create a file job.properties nameNode=hdfs://ip-10-74-66-159.vulab.com:8020 jobTracker=ip-10-74-66-190.vulab.com:8050 userName=srinivas script_name_external = ${nameNode}/user/${userName}/hive_scripts/external.hive database=dev_stg oozie.use.system.libpath=true   Create a file workflow.xml <workflow-app xmlns = “uri:oozie:workflow:0.4” name = “simple-Workflow”> <start to = “Create_External_Table” /> <action name = “Create_External_Table”> <hive xmlns […]

Continue Reading

0 Comments

Posted on 06 December 2014 by Srinivas Nelakuditi

Apache Hadoop 2.6 is Available Now.

Congrats to Apache Hadoop Community on releasing the Hadoop 2.6 version. Yarn storage features have been upgraded in this version. See the Hadoop 2.6.0 Release Notes for details. Try it out and let us know if you find any bugs or interesting enhancements.

Continue Reading

0 Comments