zt
merge Statement
Oracle9i introduces a new set of server functionality especially beneficial for the ETL (Extraction, Transformation, and Loading) part of any Business Intelligence process flow, addressing all the needs of highly scalable data transformation inside the database.
One of the most exciting new features addressing the needs for ETL is the SQL statement merge. The new SQL combines the sequence of conditional INSERT and UPDATE commands in a single atomic statement, depending on the existence of a record. This operation is commonly known as Upsert functionality.
In a data warehouse environment, tables (typically fact tables) need to be refreshed periodically with new data arriving from on-line systems. This new data may contain changes to existing rows in tables of the warehouse and/or new rows that need to be inserted. If a row in the new data corresponds to an item that already exists in the table, an UPDATE is performed; if the row's primary key does not exist in the table, an INSERT is performed. Many times, the source system cannot distinguish between newly inserted or changed information during the extraction; for very complex transformations, it's sometimes nearly impossible to know the effect of changed source data. These scenarios require the determination of insert versus update to be done during data loading.
Prior to Oracle9i, these operations were expressed either as a sequence of DMLs (INSERT/UPDATE) or as PL/SQL loops deciding, for each row, whether to insert or update the data. Both these techniques face performance handicaps: the first requires multiple data scans, and the second operates on a per-record basis. By extending SQL with a new syntax - the merge statement - Oracle9i overcomes the deficiencies of the old approaches and makes the implementation of warehousing applications more simple and intuitive.
The following is an example of the merge statement. The fact table SALES_FACT in a data warehouse for a retailer needs to be periodically updated with sales data coming from the on-line systems. If the retailer opens a new store, then the data for the store needs to be inserted into the SALES_FACT table.
Oracle9i Implementation:
In Oracle9i, the merge statement INSERTS and UPDATES the data with a single SQL statement.
merge INTO SALES_FACT D
USING SALES_JUL01 S
ON (D.TIME_ID = S.TIME_ID
AND D.STORE_ID = S.STORE_ID
AND D.REGION_ID = S.REGION_ID)
WHEN MATCHED THEN
UPDATE
SET d_parts = d_parts + s_parts,
d_sales_amt = d_sales_amt + s_sales_amt,
d_tax_amt = d_tax_amt + s_tax_amt,
d_discount = d_discount + s_discount
WHEN NOT MATCHED THEN
INSERT (D.TIME_ID ,D.STORE_ID ,D.REGION_ID,
D.PARTS ,D.SALES_AMT ,D.TAX_AMT ,D.DISCOUNT)
VALUES (
S.TIME_ID ,S.STORE_ID ,S.REGION_ID,
S.PARTS ,S.SALES_AMT ,S.TAX_AMT ,S.DISCOUNT);
Oracle8i Implementation:
In Oracle8i, you could choose to implement it as a sequence of DML statements.
UPDATE
(SELECT
S.TIME_ID ,S.STORE_ID ,S.REGION_ID,
S.PARTS s_parts ,S.SALES_AMT s_sales_amt ,S.TAX_AMT s_tax_amt ,S.DISCOUNT s_discount,
D.PARTS d_parts ,D.SALES_AMT d_sales_amt ,D.TAX_AMT d_tax_amt ,D.DISCOUNT d_discount
FROM SALES_JUL01 S, SALES_FACT D
WHERE D.TIME_ID = S.TIME_ID
AND D.STORE_ID = S.STORE_ID
AND D.REGION_ID = S.REGION_ID) JV
SET d_parts = d_parts + s_parts,
d_sales_amt = d_sales_amt + s_sales_amt,
d_tax_amt = d_tax_amt + s_tax_amt,
d_discount = d_discount + s_discount
;
INSERT INTO SALES_FACT (
TIME_ID,STORE_ID ,REGION_ID,
PARTS ,SALES_AMT ,TAX_AMT ,DISCOUNT)
SELECT
S.TIME_ID ,S.STORE_ID ,S.REGION_ID,
S.PARTS ,S.SALES_AMT ,S.TAX_AMT ,S.DISCOUNT
FROM SALES_JUL01 S
WHERE (S.TIME_ID, S.STORE_ID, S.REGION_ID) NOT IN (
SELECT D.TIME_ID, D.STORE_ID, D.REGION_ID
FROM SALES_FACT D
)
;
Alternatively, you could create a procedural implementation. It would need to check every load record against the target to find if the record already exists; only then could it decide whether to insert or update the data.
Both of these approaches suffer from deficiencies in performance and usability. The new merge command overcomes these deficiencies, processing the conditional INSERT-or-UPDATE within a single statement. The data is scanned only once, and the appropriate DML command is issued, either serially or in parallel.
The new merge command brings major performance benefits by providing an optimized internal feature for the common Upsert task within ETL processing. Furthermore, it simplifies the development of transformation processing inside the database - using Oracle9i as the transformation engine.