Hive Change Location for Database or Schema

Hive Change Location for Database or Schema

Before we begin, I would like to mention that in Hive keyword SCHEMA and DATABASE can be used interchangeably. Meaning you can use the keyword SCHEMA instead of DATABASE in all the database related commands.

Notes:

  • I would also like to mention that I do not own any responsibility of corrupting the data inside Hive or Hive metastore. This is purely based on the solution which worked in my environment and might not necessarily work for you as well. Please be careful to take right backups before you follow these suggestions.
  • The solution assumes that the database does not have any tables. In case you have any tables inside the database, the solution might be slightly different. You will have to move re-point and move the tables and partitions inside the tables before you actually modify the database directory location in Hive.
  • Having done with the disclaimer let’s proceed!

    By default Hive will create a separate directory for each database. The tables for the database will be store inside the subdirectories of the directory for database. The directory for the databases are created inside of the master directory which is referred to as the Warehouse Directory. The location for the Warehouse Directory is defined by the parameter hive.metastore.warehouse.dir inside the Hive configuration.

    In most distributions of Hadoop it is /user/hive/warehouse inside your HDFS file system.

    Anytime a new database is created in Hive, there will be a directory created under /user/hive/warehouse.

    Let’s check an example here.

    hive> CREATE DATABSE gagan_stage;
    OK
    Time taken: 0.638 seconds

    Let’s verify the location set for this database.

    hive> DESCRIBE DATABASE gagan_stage;
    OK
    gagan_stage hdfs://my-master-namenode:8020/user/hive/warehouse/gagan_stage.db
    Time taken: 0.043 seconds, Fetched: 1 row(s)

    You can cross check the presence of the directory /user/hive/warehouse/gagan_stange.db in your HDFS filesystem.

    However, you can also specify a custom location for the database while creating it. This can be done like below.

    hive> CREATE DATABASE gagan_stage_custom LOCATION ‘hdfs://my-master-namenode:8020/user/gagan/my_warehouse/gagan_stage_custom.db’;
    OK
    Time taken: 0.450 seconds

    Hive Change Location for Database or Schema

    However, in some cases you might want to modify the location of a Hive Database. A common case would be typo when specifying the custom location or quota or disk space issues.

    Hive does not allow changing the ‘Location’ of database. However, there is a workaround for this.

    You can update the Hive metastore for the database in question to update the directory location.

    In order to achieve this, you need to connect to the database server (MySQL or PgSQL) which holds the metastore for your Hive instance.

    Depending on your database you will have to run the update commands to update the database location. I will provide the example for the Hive Metastore stored in PgSQL.

    The database name for Hive metastore is the same unless custom modified. In my instance the database name for Hive metastore is metastore.

    Hive Change Location for Database or Schema

    I will modify the directory location for the database gagan_stage_custommentioned in my previous example.

    On Hive metastore let’s verify the current parameters in the metastore database.

    metastore=# SELECT * FROM “DBS” WHERE “NAME” = ‘gaga_stage_custom’;
    
     DB_ID | DESC |                                DB_LOCATION_URI                               |        NAME
    ———————+——————+——————————————————————————————————————————————————————————————————————————————+————————————————————
         4 |      | hdfs://my-master-namenode:8020/user/gagan/my_warehouse/gagan_stage_custom.db | gagan_stage_custom
    
    (1 row)
    

    Time to modify the database directory location for the database. I will point it to hfs://my-master-namenode:8020/user/hive/warehouse/gagan_stage_custom.db

    metastore=# UPDATE “DBS” SET “DB_LOCATION_URI” = ‘hfs://my-master-namenode:8020/user/hive/warehouse/gagan_stage_custom.db’ WHERE “NAME” = ‘gagan_stage_custom’;

    UPDATE 1

    That should do it. You can go ahead and verify the location in Hive metastore and Hive as well.

    metastore=# SELECT * FROM “DBS” WHERE “NAME” = ‘gaga_stage_custom’;
    
     DB_ID | DESC |                                DB_LOCATION_URI                               |        NAME
    ———————+——————+——————————————————————————————————————————————————————————————————————————————+————————————————————
         4 |      | hdfs://my-master-namenode:8020/user/hive/my_warehouse/gagan_stage_custom.db | gagan_stage_custom
    
    (1 row)
    
    
    hive> DESCRIBE DATABASE gagan_stage_custom;
    OK
    gagan_stage_custom		hdfs://my-master-namenode:8020/user/hive/warehouse/gagan_stage_custom.db
    Time taken: 0.065 seconds, Fetched: 1 row(s)
    

    Hive Change Location for Database or Schema

    No Comments

    Post a Comment

    Time limit is exhausted. Please reload CAPTCHA.