Working with multiple distributions of Hadoop, upgrades and downgrades can lead us to a point where we want to start everything clean. Unfortunately, Ambari 1.x does not provide an ability to clean wipe the entire Hadoop cluster.

With several different version of Hortonworks HDP and Pivotal HD, I reached to a point where I wanted to uninstall everything completely.

I compiled this document which will give an overview of how you can achieve this.

I would like to make this document as simple and straightforward as possible. Many of these commands are to be executed on each and every node in the cluster. In order to ensure parallelization, I would recommend using some kind of Parallel SSH utility like MaSSH or pdsh. Choose the tool which suits you.

First and Foremost – Stop All Services

Using Ambari Web UI or REST API stop all the services running on the cluster.

Stop Ambari Server and Client instances

Login to the Ambari host and stop the Ambari server.

[[email protected] ~]# ambari-server stop

Stop the Ambari agent on all the nodes on the cluster.

(You can use the tool of your choice to execute the command below on all the hosts in the cluster.)

# ambari-agent stop

Verify that Ambari server and agent is actually stopped on the cluster.

Remove All Hadoop Packages

It’s time to tear everything apart now.

Run the following commands on all the hosts in the cluster (including the master nodes).

# yum -y remove `yum list installed | grep -i hadoop | cut -d. -f1 | sed -e :a -e ‘$!N; s/\n/ /; ta’`
# yum -y remove ambari*
# yum -y remove `yum list installed | grep -w ‘HDP’ | cut -d. -f1 | grep -v “^[ ]” | sed -e :a -e ‘$!N; s/\n/ /; ta’`
# yum -y remove `yum list installed | egrep -w ‘hcatalog|hive|hbase|zookeeper|oozie|pig|sqoop|snappy|hadoop-lzo|knox|hadoop|hue’ | cut -d. -f1 | grep -v “^[ ]” | sed -e :a -e ‘$!N; s/\n/ /; ta’`

Remove the databases now

Many components in Hadoop have their metadata stored in some relational databases. An example of such components include Ambari, Hive, Hue, Oozie. Depending on your install you might have those databases defined in your flavor of RDBMS. From what I know many users continue to use the default derby DB or the SQL Lite database for components like Oozie & Hue.

In my case I have everything setup in MySQL and PostgreSQL. You can follow any of the two options to remove the databases.

1. Drop the databases
2. Remove the entire RDBMS installation.

I prefer the graceful method of using DROP DATABASE IF EXISTS DB_NAME.

Remove HDFS Directories

Time to remove the contents of the HDFS as well. Since yum just removed the binaries, the data is still present on the datanodes and namenode.

Remove the {dfs.namenode.name.dir} and {dfs.datanode.data.dir} directories. In my case these happen to be /hadoop/nn & /hadoop/data[1-5]

Remove these directories from all the nodes.

rm -rf /hadoop/nn /hadoop/data*

Where, you have to replace the directories with the one you have in your environment!

Finally Remove all the remaining files for HDP

Just remove all the remaining directories for Hadoop and it’s related components.

Execute the following commands on all the nodes.

# rm -rf `find /etc -maxdepth 1 | egrep -wi ‘mysql|hcatalog|hive|hbase|zookeeper|oozie|pig|sqoop|snappy|hadoop|knox|hadoop|hue|ambari|tez|flume|storm|accumulo|spark|kafka|falcon|slider|ganglia|nagios|phoenix’ | sed -e :a -e ‘$!N; s/\n/ /; ta’`
# rm -rf `find /var/log -maxdepth 1 | egrep -wi ‘mysql|hcatalog|hive|hbase|zookeeper|oozie|pig|sqoop|snappy|hadoop|knox|hadoop|hue|ambari|tez|flume|storm|accumulo|spark|kafka|falcon|slider|ganglia|nagios|phoenix’ | sed -e :a -e ‘$!N; s/\n/ /; ta’`
# rm -rf `find /tmp -maxdepth 1 | egrep -wi ‘hadoop’ | sed -e :a -e ‘$!N; s/\n/ /; ta’`

Remove the YUM repositories for Ambari and HDP

Remove the repositories related to Ambari & HDP from all the nodes.

# rm -rf /etc/yum.repos.d/ambari.repo /etc/yum.repos.d/HDP*

That should give you a clean cluster where Hortonworks HDP was uninstalled completely.

