Oozie Job Scheduling for Hive using Coordinator and Workflow - vulabvulab

Posted on 07 March 2017 by Srinivas Nelakuditi

Oozie Job Scheduling for Hive using Coordinator and Workflow

Let us create a Coordinator Job for Oozie.

The job will run a Hive script at every 5 minute interval.

Create a file job.properties

nameNode=hdfs://ip-10-74-66-159.vulab.com:8020
jobTracker=ip-10-74-66-190.vulab.com:8050

userName=srinivas
script_name_external = ${nameNode}/user/${userName}/hive_scripts/external.hive
database=dev_stg

oozie.use.system.libpath=true

 

Create a file workflow.xml

<workflow-app xmlns = “uri:oozie:workflow:0.4” name = “simple-Workflow”>
<start to = “Create_External_Table” />
<action name = “Create_External_Table”>
<hive xmlns = “uri:oozie:hive-action:0.4”>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<script>${script_name_external}</script>
</hive>
<ok to = “end” />
<error to = “kill_job” />
</action>

<kill name = “kill_job”>
<message>Job failed</message>
</kill>
<end name = “end” />
</workflow-app>

Create a file coordinator.xml file

 

 <coordinator-app xmlns="uri:oozie:coordinator:0.2" name="create_table_hive" 
frequency="5 * * * *" start="2017-02-01T08:00Z" 
end ="2027-02-01T08:00Z" timezone="America/Los_Angeles">
   
   <controls>
      <timeout>1</timeout>
      <concurrency>1</concurrency>
      <execution>FIFO</execution>
      <throttle>1</throttle>
   </controls>
   
   <action>
      <workflow>
         <app-path>hdfs://ip-10-74-66-159.vulab.com:8020/user/srinivas/workflow.xml</app-path>
      </workflow>
   </action>
	
</coordinator-app>

 

Create a file called external.hive

CREATE TABLE IF NOT EXISTS dev_stg.external_table(
   name string,
   age int,
   address string,
   zip int
)
row format delimited
fields terminated by ','
stored as textfile;
  • Save the file job.properties in /home/srinivas in your linux home folder
  • Save the file coordinator.xml in /user/srinivas in hdfs folder using hdfs fs -put coordinator.xml /user/srinivas
  • Save the file workflow.xml in /user/srinivas in hdfs folder using hdfs fs -put workflow.xml /user/srinivas
  • Save the file external.hive in /user/srinivas in hdfs folder using hdfs fs -put external.hive /user/srinivas

    Run the command to submit the job for scheduling:

oozie job -oozie http://ip-10-74-66-190.vulab.com:11000/oozie/ -config /home/srinivas/job.properties -D oozie.coord.application.path=hdfs://ip-10-74-66-159.vulab.com:8020/user/srinivas/coordinator.xml -run

To check the status of the job:
http://ip-10-74-66-190.vulab.com:11000/oozie/

To kill a job you can use.
oozie job -oozie http://ip-10-74-66-190.vulab.com:11000/oozie/ -kill 0000020-170307022626285-oozie-oozi-C

 

 

0 Comments

Leave a Reply