Oozie is an open source scheduler for Hadoop. It simplifies workflow and coordination between jobs. Using Oozie we can define dependency between jobs for an input data and hence can automate job dependency using ooze scheduler.
Installation
I used Cloudera's Oozie Repository. To add the repository to OpenSUSE's zypper package manager, as root, enter the following:
[linux-mlkb:/ROOT]> zypper addrepo -f http://archive.cloudera.com/sles/11/x86_64/cdh/cloudera-cdh3.repo [linux-mlkb:/ROOT]> zypper search oozie Loading repository data... Reading installed packages... S | Name | Summary | Type --+--------------+-------------------------------------------------------+----------- | oozie | Oozie is a system that runs workflows of Hadoop jobs. | package | oozie | Oozie is a system that runs workflows of Hadoop jobs. | srcpackage | oozie-client | Client for Oozie Workflow Engine | package
Now we can install the software:
[linux-mlkb:/ROOT]> zypper install oozie Loading repository data... Reading installed packages... Resolving package dependencies... The following NEW packages are going to be installed: bigtop-utils oozie oozie-client 3 new packages to install. Overall download size: 91.8 MiB. After the operation, additional 111.6 MiB will be used. Continue? [y/n/?] (y): y Retrieving package bigtop-utils-3.4.0+3-1.noarch (1/3), 6.8 KiB ( 13.6 KiB unpacked) Retrieving: bigtop-utils-3.4.0+3-1.noarch.rpm .............................................[done] Retrieving package oozie-client-2.3.2+27.30-1.noarch (2/3), 34.7 MiB ( 53.9 MiB unpacked) Retrieving: oozie-client-2.3.2+27.30-1.noarch.rpm .........................................[done (645.2 KiB/s)] Retrieving package oozie-2.3.2+27.30-1.noarch (3/3), 57.1 MiB ( 57.7 MiB unpacked) Retrieving: oozie-2.3.2+27.30-1.noarch.rpm ................................................[done (339.6 KiB/s)] Installing: bigtop-utils-3.4.0+3-1 ........................................................[done] Installing: oozie-client-2.3.2+27.30-1 ....................................................[done] Installing: oozie-2.3.2+27.30-1 ...........................................................[done] Additional rpm output: insserv: Service network is missed in the runlevels 2 4 to use service oozie Note: This output shows SysV services only and does not include native systemd services. SysV configuration data might be overridden by native systemd configuration. oozie 0:off 1:off 2:on 3:on 4:on 5:on 6:off
Configuration
The oozie utility scripts should now be located in the following folder:
/usr/lib/oozie
Oozie requires a database. Oozie uses java's Derby RDBMS by default. To use another database such as MySQL, as root, Install and configure a database first before going further.
(install and configure a database)
Oozie requires a JDBC driver. The following example adds one for MySQL. Enter the following as root:
[linux-mlkb:/ROOT]> service oozie stop [linux-mlkb:/ROOT]> md tmp [linux-mlkb:/ROOT]> cd tmp [linux-mlkb:/ROOT]> wget http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.31.tar.gz [linux-mlkb:/ROOT]> tar -zxf mysql-connector-java-5.1.31.tar.gz [linux-mlkb:/ROOT]> cd mysql-connector-java-5.1.31 [linux-mlkb:/mysql-connector-java-5.1.31]> cp mysql-connector-java-5.1.31-bin.jar /var/lib/oozie/
As the Oozie user, create the schema that Oozie needs by executing the commands below:
[linux-mlkb:/]> sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create -run
Sample Output
setting OOZIE_CONFIG=/etc/oozie/conf setting OOZIE_DATA=/var/lib/oozie setting OOZIE_LOG=/var/log/oozie setting OOZIE_CATALINA_HOME=/usr/lib/bigtop-tomcat setting CATALINA_TMPDIR=/var/lib/oozie setting CATALINA_PID=/var/run/oozie/oozie.pid setting CATALINA_BASE=/usr/lib/oozie/oozie-server-0.20 setting CATALINA_OPTS=-Xmx1024m setting OOZIE_HTTPS_PORT=11443 ... DONE Oozie DB has been created for Oozie version '3.3.2-cdh4.7.0' The SQL commands have been written to: /tmp/ooziedb-8250405588513665350.sql
Now we need to configure the web console. You will need to download the ExtJS lib using the following commands, as root:
[root@master tmp]# wget http://archive.cloudera.com/gplextras/misc/ext-2.2.zip
[root@master tmp]# /usr/lib/oozie/bin/oozie-setup.sh -extjs ext-2.2.zip
Finally, start the oozie server, by running following commands.
[root@master tmp]# service oozie start [root@master tmp]# service oozie status
oozie.service - LSB: Oozie server daemon
Loaded: loaded (/etc/init.d/oozie)
Active: active (exited) since Wed, 08 Apr 2015 20:06:43 -0400; 5s ago
Process: 8501 ExecStop=/etc/init.d/oozie stop (code=exited, status=0/SUCCESS)
Process: 8600 ExecStart=/etc/init.d/oozie start (code=exited, status=0/SUCCESS)
CGroup: name=systemd:/system/oozie.service
Apr 08 20:06:43 linux-mlkb oozie[8600]: Setting OOZIE_HTTP_HOSTNAME: linux-mlkb
Apr 08 20:06:43 linux-mlkb oozie[8600]: Setting OOZIE_HTTP_PORT: 11000
Apr 08 20:06:43 linux-mlkb oozie[8600]: Setting OOZIE_ADMIN_PORT: 11001
Apr 08 20:06:43 linux-mlkb oozie[8600]: Setting OOZIE_BASE_URL: http://linux-mlkb:11000/oozie
Apr 08 20:06:43 linux-mlkb oozie[8600]: Using CATALINA_BASE: /var/lib/oozie/oozie-server
Apr 08 20:06:43 linux-mlkb oozie[8600]: Setting CATALINA_OUT: /var/log/oozie/catalina.out
Apr 08 20:06:43 linux-mlkb oozie[8600]: Using CATALINA_PID: /var/run/oozie/oozie.pid
Apr 08 20:06:43 linux-mlkb oozie[8600]: Using CATALINA_OPTS: -Dderby.stream.error.file=/var/log/oozie/derby.log
Apr 08 20:06:43 linux-mlkb oozie[8600]: Adding to CATALINA_OPTS: -Doozie.home.dir=/usr/lib/oozie -Doozie.config.dir=/etc/oozie -Doozie.log.dir=/var/log/oozie -Doozie.d...:11000/oozie
Apr 08 20:06:43 linux-mlkb oozie[8600]: Oozie start succeeded [root@master tmp]# oozie admin -oozie http://localhost:11000/oozie -status System mode: NORMAL
That's it, Oozie should now be running. Below is a sample of what the Oozie console should look like.
