The use of one Knowledge Module technology rather than another may dramatically change the performances:
– JDBC driver access versus database native loader
– INSERT.SELECT versus SELECT.INSERT
– If there is a Knowledge Module dedicated to your database, you should always prefer this one to benefit from database specifics
– Optimizer hints
– Multiple transaction levels
– Commit policies
1. Reduce the data movement between servers.
– Correctly locate the staging area.
– Correctly locate the transformations – filters, joins and mappings.
– Use Journalization / Change Data Capture.
2. Accelerate the data movement by selecting best loading strategies. The methods used to move a data set from one server to another are determined by the Loading Knowledge Module (LKM) selected. ODI uses tree main methods to transfer data from one server to another:
– Data flowing through the Agent
– Data flowing through loaders
– Specific methods
Data flowing through the Agent: When data flows through the Agent following parameters can help us tune the data flow:
– Array Fetch in the source data server definition defines the size of the batches of records read from the source at a time and stored within the Agent.
– Batch Update in the target data server definition defines the size of the batches of records written to the target.
The Batch Update/Array Fetch configuration is a compromise between network and Agent overhead.With a highly available network, you can keep low values (<30, for example).With a poor network, you can use larger values (100 and more). Array Fetch/Batch Update typical values are between 30 and 500.
Data Flowing through Loaders: When loaders are used to transfer data, the Agent delegates the work to external loading programs such as Oracle SQL*Loader, Microsoft SQLServer BCP, etc.Typically, data is extracted by the source system native loader to a file, and then this file is copied to the target machine and then loaded using the target native loader.Such calls require the loader to be installed on the machine running the Agent. It is also possible to use an ODI built-in unloader command called OdiSQLUnload.Loaders usually provide the best performances.
Specific methods: These methods include engine-specific loading methods, including:
– Server to Server communication through technologies similar to Oracle Database links (using OCI) or Microsoft SQLServer Linked Servers.
– File to Table loading specific methods such as Oracle External Tables.
– Creating AS/400 DDM Files and loading data through a CPYF command.
– Using an Unloader/Loader method through UNIX pipes instead of files for very large volumes.
Usually, the recommendation to achieve optimal performance is to use the KMs that leverage most of the specific native features of both the source the target technologies.
For example, using a SQL to SQL (data flowing through the Agent) strategy from Oracle to Oracle is not a good idea, since there are methods such as DBLINKS or SQL*Loader that are several times faster.
3. Perform large joins optimally and locate them correctly: Joins are the second most time consuming activity in data integration. Join capabilities vary from one engine to another. Having the staging area on a server with the most available CPU, and joins running on this staging area is usually the best choice. Because ODI generates SQL for each of the joins involved in a transformation, analyzing the underlying database execution plans for these SQL queries can help tune a Mapping / Integration Interface for better performance. It is also important to understand that joins may replace other transformations with better performances. For example, a filter such as:
TABLE1.COLUMN1 in (SELECT COLUMN2 from getObjectName(TABLE2))
can be replaced with
a join between TABLE1.COLUMN1 and TABLE2.COLUMN2.
4. Strategies for Integrating Data: Prevent useless joins and accelerate data integration to the target by selecting an appropriate integration strategy. The number of operations required for integration strategies (IKM) is directly related to the complexity of the strategies. It is recommended to avoid using unnecessarily complex IKMs when working with large volumes. Common examples of unnecessary operations:
– Using an incremental update knowledge module with DELETE TARGET and INSERT options selected. Using a control append knowledge module instead would have the same result with better performance.
– Checking the DISTINCT option for integration when duplicates are not possible or when duplicate records are caused by incorrectly defined joins. Performing a DISTINCT is a time consuming operation for a database engine.
5. Strategies for Checking Integrity: The number of constraints checked in a data integrity process (CKM) adds more operations and therefore slows down the overall integration process. It is recommended to avoid using useless checks when working with large volumes.
Example of unnecessary operations:
Checking three times in an integration chain the same data that has not changed.
Ref: Doc ID 423726.1