
To create ACID tables, ensure Hive is configured to support ACID transactions by setting the following properties:
SET hive.support.concurrency=true;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET hive.compactor.initiator.on=true;
SET hive.compactor.worker.threads=1;
Hive Version Compatibility
ACID transactions in Hive are supported from version 0.14.0 onwards. Ensure that your Hive installation is at least this version. Hive 3.x introduced significant improvements and additional features for ACID transactions.
Creating ACID Tables
Full ACID Table
Full ACID tables support all CRUD (Create, Retrieve, Update, Delete) operations and require the ORC file format:
CREATE TABLE acidtbl (
key int,
value string
)
STORED AS ORC
TBLPROPERTIES ("transactional"="true");
Insert-Only ACID Table
Insert-only ACID tables support only insert operations and can use various storage formats:
CREATE TABLE acidtbl_insert_only (
key int,
value string
)
STORED AS TEXTFILE
TBLPROPERTIES ("transactional"="true", "transactional_properties"="insert_only");
Converting Tables to ACID
Non-ACID to Full ACID
To convert a non-ACID managed table to a full ACID table (requires ORC format):
ALTER TABLE nonacidtbl SET TBLPROPERTIES ('transactional'='true');
Non-ACID to Insert-Only ACID
To convert a non-ACID managed table to an insert-only ACID table:
ALTER TABLE nonacidtbl SET TBLPROPERTIES ('transactional'='true', 'transactional_properties'='insert_only');
Data Operations on ACID Tables
Inserting Data
INSERT INTO acidtbl VALUES (1, 'a');
INSERT INTO acidtbl VALUES (2, 'b');
Updating Data
UPDATE acidtbl SET value='updated' WHERE key=1;
Performing Merge Operations
MERGE INTO acidtbl USING src
ON acidtbl.key = src.key
WHEN MATCHED AND src.value IS NULL THEN DELETE
WHEN MATCHED AND (acidtbl.value != src.value) THEN UPDATE SET value = src.value
WHEN NOT MATCHED THEN INSERT VALUES (src.key, src.value);
Understanding Table Structures
ACID Tables (Transactional)
ACID tables have a specific directory structure in HDFS:
/user/hive/warehouse/t/
├── base_0000022/
│ └── bucket_00000
├── delta_0000023_0000023_0000/
│ └── bucket_00000
└── delta_0000024_0000024_0000/
└── bucket_00000
- Base Directory: Contains the original data files.
- Delta Directories: Store changes (inserts, updates, deletes).
Non-ACID Tables
Non-ACID tables have a simpler structure:
/user/hive/warehouse/table_name/
├── file1.orc
├── file2.orc
└── file3.orc
# For partitioned tables:
/user/hive/warehouse/table_name/
├── partition_column=value1/
│ ├── file1.orc
│ └── file2.orc
└── partition_column=value2/
├── file3.orc
└── file4.orc
File Format Considerations
- Full ACID Tables: Only the ORC (Optimized Row Columnar) file format is supported for full ACID tables that allow all CRUD operations.
- Insert-Only ACID Tables: These tables support various file formats, not limited to ORC. You can use formats like TEXTFILE, CSV, AVRO, or JSON.
- Managed Tables: The managed table storage type is required for ACID tables.
- External Tables: ACID properties cannot be applied to external tables, as changes to external tables are beyond Hive’s control.
- Converting Existing Tables: When converting a non-ACID managed table to a full ACID table, the data must be in ORC format.
- Default Format: When creating a full ACID table without specifying the storage format, Hive defaults to using ORC.
Managed vs. External Tables
- Managed Tables: Support ACID transactions. They can be created as transactional (either full ACID or insert-only).
Example of a full ACID managed table:
CREATE TABLE managed_acid_table (
id INT,
name STRING
)
STORED AS ORC
TBLPROPERTIES ("transactional"="true");
Example of an insert-only ACID managed table:
CREATE TABLE managed_insert_only_table (
id INT,
name STRING
)
STORED AS TEXTFILE
TBLPROPERTIES ("transactional"="true", "transactional_properties"="insert_only");
External Tables: Do not support ACID transactions. These tables are used for data managed outside Hive’s control.
Example of an external table:
CREATE EXTERNAL TABLE external_table (
id INT,
name STRING
)
STORED AS TEXTFILE
LOCATION '/path/to/external/data';
Limitations of ACID Tables
- Performance Overhead: ACID tables introduce additional overhead due to the need for transactional logging and compaction processes.
- Storage Requirements: The delta files and base files can increase storage requirements.
- Compaction: Regular compaction is necessary to maintain performance and manage storage, which can add complexity.
- Version Dependency: Ensure that you are using a Hive version that supports the desired ACID features, as improvements and bug fixes are version-dependent.
- External Table Limitation: ACID properties cannot be applied to external tables.
Key Points
- Only managed tables can be converted to ACID tables.
- External tables cannot be made transactional.
- Full ACID tables require the ORC file format.
- Converting ACID tables back to non-ACID tables is not supported.
Stackademic 🎓
Thank you for reading until the end. Before you go:
- Please consider clapping and following the writer! 👏
- Follow us X | LinkedIn | YouTube | Discord
- Visit our other platforms: In Plain English | CoFeed | Differ
- More content at Stackademic.com






































