Case2. Analyzing Performance Degradation caused by DOP (Degree of Parallelism) Downgrade
The database nowadays not stores the DW (Dataware House), ODS (Operational Data Store), but also the OLTP system, or a large data as well. The major functions which the Oracle provides in order to process a large data effectively are partitioning and parallel processing. By using the partitioning function, reduce the segment size by applying the large data to partition option (range, list, hash, subpartition) appropriate for the task standards, and use multiple processes for a single task through parallel processing. Especially, because parallel processing function actively utilizes the hardware resource (multiple cores, multiple IO Channels, multiple nodes in a RAC environment, and even from 11gR2 through multiple buffer cache by using the In-Memory PX function), it could dramatically improve the query performance.
However, the problem with parallel processing is that you can only use it within the limited range of resources. In other words, because you are using a limited parallel slave process, in case a parallel slave process is not allocated, the query performance could drastically slow down. As a safety measure for such a case, the Oracle provides a statement queuing function from 11gR2, but this is not a perfect solution. Therefore, in this case study, we would like to introduce a method through which you can diagnose and analyze the performance issues that are generated by DOP downgrade by using MaxGauge.
The flow of investigating the cause of performance degradation due to DOP downgrade through MaxGauge is as follows.
Image 1. Flow of Investigating Performance Degradation due to DOP downgrade
Step 1.1 Real-time Monitoring through PQ Session Frame
Upon receiving a notification from the developer regarding the batch job delay, the first thing you must do is to check the module name with the developer and then check the PQ sessions currently in progress by using the PQ Session Frame.
Module Name Setting Method
SQL> exec DBMS_APPLICATION_INFO.SET_MODULE('BATCH_CASE2','');
Image2. Checking the PQ Sessions Currently in Progress through PQ Session Frame
(!) Check Results: BATCH_CASE2 module does not exist within the PQ Session Frame. However, you can know that only the parallel queries executed in [OOW] schema’s SQL*Plus are being processed through parallel processing.
Step 1.2 Real-time Monitoring Through Active Sessions Frame
BATCH_CASE2 module checks the Active Sessions Frame’s content to confirm the status of the current activities.
Image3. Checking the Sessions Currently In Progress through Active Sessions Frame
(!) Check Results: BATCH_CASE2 module is currently in progress for [216] seconds, and is being processed through a single process.
Step 1.3 Session Monitoring by Using the Session Details
In order to check the corresponding session details, double-click the corresponding sessions and connect from the Active Sessions Frame to the Session Details screen. The Session Details screen is used to closely monitor and check the session details of a single particular session. Among all information, check the SQL Text and the Wait Event Total .
Image4. Session Detail -> Check the SQL Text from the Delta Screen
Image5. Session Detail -> Check the Wait Event Total from the Sigma Screen
Image 6. Check Execution Plan
(!) Check Results: Within the SQL statement, FULL and PARALLEL hints have been applied. As a result of checking the execution plan by using the Client Tool (LitePlus used in this document), it has been confirmed that the PQ processing is possible for the corresponding SQL. However, as it appears on the Wait Even Total screen, the corresponding SQL did not generate a PX related wait events which occurs during PQ processing.I In other words, the corresponding SQL has been processed by serial processing during run time. This is because the corresponding SQL could not acquire a PQ slave at run time and hence the DOP has been downgraded.
Step 1.4 Today’s Batch Job Execution Period Analysis on the PA Performance Trend
In order to analyze the reason why the BATCH_CASE2 module was not allocated any PQ salve process, move to the BATCH_CASE2 module’s activity starting point.
Image7. Today’s Batch Job Starting Point’s Database Activity Status
(!) Check Results: You can know that at the [OWK] schema of BATCH_CASE2 module starting point, parallel processing is in progress by using multiple (50) PQ slave processes. As a result, the BATCH_CASE2 module was not allocated any PQ slave processes.
Step 1.5 Yesterday’s Batch Job Execution Period Analysis on PA Performance Trend
To ensure the accuracy of the analysis, move to yesterday’s starting point in which the BATCH_CASE2 was executed normally.
Image8. Yesterday’s Batch Job Starting Point’s Database Activity Status
(!) Check Results: You can know that BATCH_CASE2 module is performing a parallel processing after being allocated the PQ slave processes.
Step 1.6 PA Parameter Check
In this way, in case of no allocation of PQ slaves at a particular moment depending on the database environment, the very first oracle parameter you need to check is ths PARALLEL_MAX_SERVERS. To check the corresponding parameter, use the PA Parameter.
Image9. Oracle parameter
(!) Check Results: The setting valuae of PARALLEL_MAX_SERVER is 50.
When the main batch jobs which should be processed through Parallel Processing, are processed through Serial Processing instead, then it will generate a serious performance degradation. Therefore, it is necessary for the DBAs to become familiar with PQ processing and the related parameter’s setting value and operation method.