Galleries

Many SAP work processes were left in “on hold” status with RFC info

I was contacted on one production issue – Almost all SAP background work processes were in “on-hold” status with RFC info in SAP SM50 work process monitor transaction and seemed like not moving. I was asked why? Please continue reading to understand the reason which I identified through analysis via transaction SM50, STAD, RZ12. This post would cover:

  • The issue – SAP work process in “on hold” status with RFC info,
  • Trouble-shooting of SAP work process in “on hold status”,
  • Conclusion,
  • Impact of SAP work process on hold and
  • Further clarification

1. The issue

SAP workload monitor are showing that majority of SAP background work process were in “on-hold” status with status information or reason “RFC response” as showed in below Figure 1.

Figure 1 SAP SM50 – WORK PROCESS IN “ON HOLD” STATUS WITH “RFC”

If you need help on SAP SM50, you can refer to my post on how to run SAP SM50/SM66.

2. Trouble-shooting

SM50 in Figure 1 shows BTC work processes on hold were related to “RFC Response”. Figure 1 also tell us all Background processes were busy, that should be a result of long running BTC due to “RFC Call” performance. Are those RFC calls to external system? Or Are those RFC calls to internal system? Do we have RFC resource issue locally or do we have RFC resource issue remotely?

2.1 System was having No Free DIA WP for handling RFC call

So I checked whether the system has free DIA work processes for handling RFC call via SAP transaction SPBT. You can check it via SAP SMQS as well.

Figure 2 SAP SPBT – RFC resource availabilities check

Figure 2 shows that the system allows up to 25 DIA WP for RFC processing but at the moment, there was no free DIA WP for handling any new “RFC” call because “ALL PBT resources are currently busy”. So you might wonder who was exhausting system RFC resources and what we can do to fix or mitigate the situation.

SAP transaction SPBT is mentioned in section of how to validate RFC configuration in my post how to configure RFC server group via RZ12.

2.2 Who were exhausting system RFC resources?

Who were exhausting the system RFC resources? I checked this via SAP workload monitor SM50. Following is the result

Figure 3 SAP SM50 – DIA WP consumption

Figure 3 told us that most of DIA work processes were occupied by one single user – this is not a normal online user. Then you might wonder origin of those RFC calls.

2.3 Where were origin of RFC calls?

I used STAD to find out what SAP job/program were firing those RFC calls or where were those RFC calls from?

Figure 4 STAD – RFC calls

You can normally find the origin/parent of RFCs via SAP transaction STAD, Display details info and then check “Client” info. In this case, we found that RFC showed in Figure 4 were coming from another SAP system – where many work processes were running under a single online SAP user which needs information from the system I was looking into.

3. Conclusion

The reason that SAP work processes were in “Stop” or “on hold” status with a reason of “RFC response” was due to contention of RFC resources in the system. Available RFC resources (DIA work processes) were occupied by a storming RFC calls from a remote SAP system.

4. Impact of SAP background work process in “On-hold”

Impact is on two areas – SAP background job could not start on time in the system due to shortage of background process created by the on-hold or stop status and job started would run longer if it needed to make RFC calls.

4.1 Long running of back ground processes

Following screen shows processing an idoc took over 1 and half hours while it can normally finished in seconds.

Figure 5 Background process ran longer than normal

Figure 6 SAP STAD – Work process with big RFC time

Based on Figure 6, we know over 99.9% of runtime is spent on RFC call. Further review, you can know that RBDAPP01 is making a RFC call which was supposed to be executed locally and it spent most of RFC time on waiting. Please refer to my post on explanation of STAD statistical data if you would like to know details.

4.2 Long waiting of background processes

“Delay” column in SAP SM37 Screen shows how long a background job has to wait before it is dispatched to an idle BTC work process. Following screen shot is tried to give you an impact of the issue.

Figure 7 SAP SM37 – Job delay due to shortage of BTC work processes created by RFC issue

You can see that job delay is varied – that depends on job schedule time and status of system.

5 Further clarification

Technically, you can fix the issue immediately by terminating the corresponding program/process in the remote system. But in our case, I just added 8 more DIA WPs to RFC resources VIA RZ12 since almost no DIA sap users were online at that earlier AM system time, the issue were gone by itself about half hours after the addition. So in this situation, just waiting and let time cure the issue if there is no business impact some times. Too many RFC work processes can cause SAP work process dead-locks sometimes.

A SAP work process can be in “RFC” status for many reasons like network, remote server issue, expensive RFC call etc. But this case, it is due to local resource issue.

Normally more BTC work processes can be seen left in “on hold” or “stop” status than DIA WP. A DIA WP would be rolled out when it is in “waiting” status after connection is established, so it is available for other online/RFC request. This is a typical SAP design. Interact activities executed by online SAP users are conducted through SAP DIA work process. You normally has much more SAP online users than available SAP DIA work process.

Further follow-up is needed to understand the interface configuration and volume control and end user activities in the remote system, so a system configuration tuning or program design change can be made to avoid the reoccurrence of this issue.

Explanation of SAP STAD Single Statistical Records/data

SAP STAD is one of my frequent used SAP transactions in performance analysis. STAD can show you where the time is spent over a list of technical components involved in a job/program execution, so it is often 1st step I would take to review performance of a program/transaction which finished recently. Following my previous post how to run and navigate through SAP STAD transaction, I would share my understanding of STAD data – that is the basis for you to use STAD to do performance analysis.

  1. Structure of STAD statistical data,
  2. Explanation of STAD statistical data and
  3. How to use STAD for performance analysis.

1 Structure of statistical data of a SAP transaction step

SAP has collected various statistical data to record performance of various technical components involved in executing a transaction step. To facilitate performance analysis, SAP groups statistical data into one main record and many optional sub-records. Whether a specific type of optional sub-records exists depends on context of transaction step and system parameter setting (see my post how to run ST03N) Following are the statistical sub-records I often deal with in my SAP environment:

  1. DB (sub records) – Statistical data related to database operations performance in a transaction step.
  2. DB procedure (sub-records) – Statistical data related to execution of database stored procedure in a transaction step. This is more important for SAP SCM boxes where LiveCache is used.
  3. Table (sub-records) – show most expensive tables accessed in the transaction step according to setting /stat/tabrec.
  4. RFC (sub-records) – show overview of RFC statistics from RFC client, RFC Server, RFC Client destination and RFC server destination sub-records according to setting /stat/rfcrec.
  5. Client Info (sub-records) – show origin of transaction step like job name, program name and system name.

Main record for statistical data of a single transaction step is like header record of a SAP business document. Main record contains statistical data related to time profile of transaction step, task and memory information and data volume involved in the transaction step. Sub-records are like details lines of a business document. One business document can have only one line of header record with 1 or more lines of detail records.

One execution of program can have more than one transaction step. SAP performance analysis normally focuses on the most expensive steps.

2
Explanation of STAD statistical data

SAP single statistical record has main records and many sub-records. SAP provides “corresponding buttons” in STAD detail screen to help you locate information of SAP statistical main record and sub-records quickly. Table 1 lists some frequent-appeared buttons showed in my SAP environment.

Button

Related statistics

Time

Time profile of transaction.

DB

Database time breakdown according to type of database operations.

DB Procedure

Details on procedures calls to LiveCache.

Task/Memory

Show memory usage at the end of step.

Table

Database time – table profiles.

RFC

RFC statistics for RFC client, RFC Server, client destination and/or server destination sub records.

Bytes

Data transferred to application.

Client Info

Origins – user, job, server which the transaction step belongs to.

Table 1 SAP STAD – navigation buttons

When a corresponding STAD sub-record exists, a related button would show up in SAP STAD Detail screen. Other buttons like “HTTP” other than what listed in Table 1 are not covered in this post.

2.1 SAP STAD – Time Profile of a Transaction Step

Top of SAP STAD “Single Statistical Records – Details” screen is the section of Analysis of time in work process section similar to Figure 1. You can back to the time profile screen by clicking “Time” button from other STAD details screens.

Figure 1 SAP STAD – Analysis of time consumption

Figure 1 is related to an individual statistical record of background step. Response time in Figure 1 is actually job duration which you would see in SM37 when the program (RMMRP000) is the only step of a background job. In Figure 1, all time components listed in “Response time” box are independent/exclusive.

Response time
= wait for work process time + Processing time( ABAP) + Generation time + Load times + Roll times for rolling in work data + Database time + Enqueue time for logical SAP locks + Roll wait time (not including task types RFC/CPIC/ALE). When DB procedure is used between application server and liveCache, a new component as “DB procedure” would be added to the Response time.

For dialog task, response time is measured at SAP application server from time when a request is received by the SAP application server to the time application server finishes processing and sending data to the client. Please note – Response time starts to tickle from the moment request is arrived at application server not the moment the request is sent by an user and it stops tickling at the moment when the application finishes processing and the last transfer of data from the application server to client is sent. The time used to send request to SAP application server and time used in sending last info back to client is not part of STAD response time but it is a part of end user’s online experience. Network and client impact on dialog user response time can be measured by GUI time and Net time (Figure 1 – right bottom corner under Frontend).For other tasks, Frontend statistics(GUI and Net time) is not relevant any more.

CPU time is not a separate time components but a sum of CPU utilization of each corresponding operation listed in “Response time” like ABAP processing, Program/screen load and generation, Roll-in and Enqueue operation except Database operation.

RFC+CPIC time: time spent by all RFC calls (as a client) in the transaction step. So this time can be bigger than response time if several long asynchronous RFC calls are initiated in parallel.

Wait for work process – time which the transaction step is put in the queue due to shortage of work process to the time when a free work process is available to execute the transaction step.

Processing time – time used in execute ABAP/IV statements.

Database time – time used by database to execute needed database operations like reading table changing existing table data or inserting new data into table. Database time is measured at application server side so it includes network time between application server and database server when applicable. So network latency can impact transaction performance when SAP server/instance used in executing the transactions is from a different server where database is in. Time used in reading data from buffered table at application server is not a part of “Database request time” but part of processing time. Time used between application server and liveCache (SAP SCM) is not part of Database time here.

For all other fields in Figure 1, you can get SAP online explanation in the transaction by placing the cursor into the corresponding data and click “F1”.

In normal ECC environment, significant time components of response time is processing time, database time and Roll wait time (when Synchronous RFC is used). Load time, generating time should be very minimal or zero—otherwise, this can indicate memory issue either setting or physical limit or there is a CPU contention.

After analysis of response time profile, you can navigate to specific time component/sub-record should you need further details.

Checking time profile of a transaction is often the first step of reviewing a business transaction/job performance. This can quickly point to area you should focus in SAP program/transaction performance analysis.

2.2 SAP STAD – Analysis of ABAP/4 Database Request

Click “DB” button in STAD details screen would bring you to screen similar to Figure 2.

This section breakdowns database time into different database operations so you can see individual database operation’s contribution toward database time and their performance.

Figure 2 SAP STAD – Database time breakdown on type of ABAP request

For details explanation on individual field in Figure 2 screen, please click corresponding data and use “F1” key to get SAP online document.

2.3 SAP STAD – Analysis of Table Access

This is to breakdown database time according to tables involved – showing types of operations on a particular table in a transaction step and how much time the transaction step has spent on the particular table. A program might has accessed many tables. For performance reasons, not all tables accessed in the transaction step but up to value specified by SAP online parameter stat/tabrec. Default setting is 5 which is enough for performance analysis.

Figure 3 SAP STAD – Database time breakdown on tables

Dir reads are referring to read table via full primary key. Other type of table reads are sequential read. Changes includes update, insert and delete database requests.

2.4 SAP STAD – Analysis of Remote Functional Call

When a transaction is a RFC step itself or when a transaction step has initiated at least one RFC call, then at least one RFC sub-records exists with the transaction step by default. You can click the “RFC” button to review RFC sub-records. You might get display similar to Figure 4.

Figure 4 SAP STAD – RFC calls over view con

Figure 4 screen is a RFC overview screen of a transaction step. It has two portions – client and server. Dependent on the step, you might see Client portion or server portion only. RFC Client shows 5 connections involved 3 destinations are having total 98 client call with calling time of 3,150 ms and execution time 2,713. RFC Server portion shows 1 connection to 1 destination has executed only 1 RFC call with call time of 27,368 ms against remote execution time of 27,367ms.

Fields

Explanation

Connections

Number of connections between client and server system.

Destination

Number of distinct destination from RFC client/server destination sub-records – Destination indicates the system where the call is sent to by the client. Destination is normally an entry which is configured in SM59 in the client system where the RFC is initiated.

Users

Numbers of distinct users from Client/Server destination sub-records.

Calls

Sum of number of Calls from Client/Server destination sub-records – For RFC client, this is number of calls made to the server. For RFC server, this is number of calls executed in the box where STAD is run.

Calling Time

Sum of calling time from Client/Server Destination sub-records. The time includes network connection/traffic time.

Remote Execution

Sum of remote execution time from each Destination sub-records. Time spent in executing the RFC in server side.

Idle

Sum of idle time between two RFC calls when connection is open.

Sent

Sum of Data sent to server for client call or Data sent back to client for server process based on RFC client/server destination records

Received

Client – Sum of Data received from server; Server – Sum of data received from client. All data are sourced from client/server destination sub-records

Table 2 SAP STAD – RFC Overview fields explanation

If you can click each highlighted fields on figure 4, you can see available SAP statistics for RFC client, RFC client destination, RFC server and RFC server destination sub-records.

2.4.1 RFC Client Destination statistics

You click the highlighted number in the “Connections” field of client portion of STAD RFC overview screen, you would see 1 more client destination record similar to Figure 5.

Figure 5 SAP STAD – RFC client destination record details

A RFC client destination have information:

  • RFC type – synchronous(wait) and asynchronous(non-wait). Here is Synchronous.
  • RFC user name – SAP user account used to execute the RFC call. Here is bgomusr.
  • Destination – Where the RFC is going to be executed. Here is CC*USD.
  • Instance and IP address– The server where the RFC call is requested and its’ IP address.
  • Partner instance and IP address – The server where the RFC call is executed and its’ IP address.
  • Calls – number of RFC calls made to the destination at the transaction step.

Please refer to above table for Explanation for other fields like Calling time and etc.

Name of Destination is normally configured in SM59 except some SAP internally reserved destination name. Name of instance is what you can see in SM51 for SAP system. Calls is total number of call made over the connection to the destination. Calling time, Remote Exec time and idle time is sum of corresponding time from each calls over this open connection to the destination. Data send/receive time together with size of sent/received data are sum of corresponding data from each calls over this open connection to the destination as well. You might see SAP “transaction code” field as well in client destination sub-record when applicable.

2.4.2 RFC Client statistics

You click the highlighted number in the “Call” fields of client portion of STAD RFC overview screen, you would see 1 more RFC client sub-records similar to Figure 6.

Figure 6 SAP STAD – RFC client record details

RFC client shows

  • Call number – RFC call sequence. Here is 1 – means FM XIPAY_CC AUTHORIZATION is the 1st call to the destination CC*USD.
  • Destination – where the FM XIPAY_CC_AUTHORIZATION is executed. Here is CC*USD.
  • Function name – Name of Function module which needs to be executed by the server. Here is XIPAY_CC_AUHTORIZATION
  • Calling time – Duration of RFC call from begin to end as seen from the client side. Here is 2,647ms
  • Remote Exec.time – Time needed to execute FM XIPAY_CC AUTHORIZATION in the destination. Here is 2,501ms.
  • Idle time – Sum of idle time between two RFC calls. Here is 0 ms.
  • Data send time – time needed to send data to Server. Here is 980 bytes
  • Date receiving time – time needed to received data from Server. Here is 1,615 bytes.

The Destination might be a different SAP server/system but it could be the same server/system where SAP transaction STAD is executed.

2.4.3 RFC Server Destination statistics

You click the highlighted number in the “Connections” field of Server portion of STAD RFC overview screen, you would see 1 more server destination record similar to Figure 7.

Figure 7 SAP STAD – RFC Server destination sub- record

A RFC Server destination record shows

  • Caller user name – SAP user name which is used to start the RFC call. Here is bgomusr.
  • Caller client number – SAP client number of the system where the RFC call is started. Here is 4*
  • Destination – The system where the RFC call is executed.
  • Instance and IP address – specific server of the destination where the RFC call is executed and its’ IP address
  • Partner instance and IP address – The server where the RFC call is requested and its’ IP address.
  • Calls – number of calls made to the destination at the transaction step.

Other fields have similar explanation to what we have in client destination. You might see transaction code as well. Execution of the transaction led to execution of the RFC call to the destination.

RFC Server destination is to track total number of RFC calls which the server processes. There might be various remote function modules being called. So it does not make sense to track function module in this context.

2.4.4 RFC Server Call statistics

You click the highlighted number in the “Call” field of Server portion of STAD RFC overview screen, you would see 1 more Server destination record similar to Figure 8.

Figure 8 SAP STAD – RFC Server Sub-record

A RFC Server sub-record shows

  • Call number – RFC call sequence. Here is 1 – means FM APPLICATION_IDOC_POST_IMMEDIAT is the 1st call to the destination *01
  • Destination – where the FM APPLICATION_IDOC_POST_IMMEDIAT is executed. Here is *01.
  • Function name – Name of Function module which has been executed. Here is FM APPLICATION_IDOC_POST_IMMEDIAT.
  • Calling time – Duration of RFC call from begin to end as seen from the client side. Here is 27,368 ms.

Please refer to RFC client statistics for explanation on remaining fields.

The destination is one of SM59 configuration in client system which might be the different system from where the remote function module was executed. This indicates that you might not find destination configured in the system where you run STAD.

Last not least on STAD RFC statistics, I would like to mention controls which SAP has in place to limit number of RFC sub-records a transaction step should keep. One transaction step can make many RFC calls to many destinations. For performance reason, SAP does not generate a RFC sub-record for each Function called and for each destination involved. SAP online parameters stat/rfcrec is used to control number of RFC sub-records a transaction is allowed to keep. The Default setting is 5. When more than 5 RFC client calls are initiated from a transaction step, SAP would only keep statistics for the 5 most expensive RFC calls. When more than 5 destinations (actually connections) are involved in RFC calls in a transaction step, SAP would only keep statistics for 5 most expensive client destinations. The same goes with RFC server and RFC server destination statistics. A RFC call is more expensive if execution time is longer. Since the 5 most expensive calls are captured, those statistics are enough for RFC performance analysis. Increasing /STAT/RFCREC parameter could be dangerous to workload collector performance. It might impact system performance due to larger workload resulted from increasing this parameter.

2.5 SAP STAD – Task and Memory information

Clicking “Task/Memory” button, you would see screen similar to Figure 9.

Figure 9 SAP STAD – Task and memory information

Task and memory information sub-record shows

  • Terminal ID –Only available for SAP DIALOG work process. When it is available- it is either a name of an end user device like a computer name, IP address or server name from which the transaction step is initiated. RFC call, HTTP request, ALE and Dialog requests are executed in SAP via SAP dialog work process. Place your cursor at the terminal field, and process F1, you can see following standard SAP document “Name of the terminal or presentation server from which the dialog step was initiated and to which the
  • Terminal In/Out message: SAP online document “Number of Bytes required by the presentation server (terminal) for communication to and from the dispatcher work process to control the dialog.” Apparently, this field makes sense for Dialog request and HTTP request etc. but not make sense for RFC/ALE and background tasks. When dialog transaction is executed over a poor network connection or poor SAP GUI performance, large terminal in/out message can impact performance.
  • Work Process No: The number of SAP work process used to execute the transaction step.
  • Trans. – ID: a SAP internal assigned ID used to locate each specific execution of a SAP transaction or program. All steps of a specific executed program/transaction would have the same transaction-ID. All RFC child processes launched from the same transaction step with the same function module would bear the same transaction id even it is different from the parent program and each of them is having different session-id.
  • Session-ID: a SAP internal assigned ID used to locate a specific session where the transaction/program is executed. For GUI, each SAP GUI window is a session which has its own session-ID, All dialog transaction steps executed in the same SAP GUI window would have the same session-ID.

Terminal ID and session-id is used to locate the device/gui related to a transaction step of an online transaction, for background job, this would be replaced by job name. Transaction-ID is used to identify all steps/activities of an execution instance of SAP transaction/program. When a program is executed, one or business document can be created or changed.

SAP performance analysis normally focus on cutting runtime and increasing throughput. I did work several cases on cutting memory usage of customized ABAP program/transaction. But I found memory information from this tab is less helpful comparing with information stated in previous sections. I encountered cases where memory usage stated in STAD was much smaller than what it was reported by SM50. Apparently, SAP STAD cannot tell you how long a certain memory is occupied by the system etc. I normally use SAP tool /SDF/MON or SM50 to monitor memory usage of a transaction/process which gives a better picture of memory utilization over course of transaction/program execution.

2.6 SAP STAD – Bytes Transferred

This is to show size of data requested by SAP application to run the transaction step.

Figure 10 SAP STAD – Application data volume

 

2.7 SAP STAD – Client Info or Extended Passport

Here “Client” means the parent which is responsible for generating the transaction step which is different from RFC client which show what RFC call is started by the transaction step.

Figure 11 SAP STAD – Client info

This screen is telling you a lot of information about parent of the transaction step you are reviewing. Above screen shows that transaction step is result of executing a batch job – “S2C-OA_US_REPROC_ST64_ORD_SINGLE” under user “BGOMUSR” from system/server “*00”. All transactions executed by the same user from the same SAP GUI window has the same root context. All steps of a SAP job has the same root context as well.

In my version, system ID and server information are determined during the job creation. When SAP system is migrated, the old server name is still showed unless the business job is deleted and recreated via SAP transaction SM36 etc. after system migration.

3 SAP STAD and performance analysis

At the beginning of the post, it is mentioned that SAP STAD is used to review statistical data of a recent transaction. What can be considered “recent”? This is defined by system parameter stat/max_file. SAP stores stats hourly at OS level which can be overwritten after number of file reach what STAT/MAX_FILE specifies. Where the stat file is stored is controlled by another parameter stat/file.

At this point, hope this post has helped you to get a better understanding of SAP single statistical records. In my future post, I might share tips on how to use STAD for performance analysis and how SAP STAD tools can work with other tools in SAP performance analysis.

Mass VS Individual deletion – Does it matter to SAP program performance?

My previous post – “SAP ST12 Trace – SQL performance analysis” mentions that we can use mass/array database operation to replace many individual execution of the same SQL to improve database access performance. This blog will go through a true business case where a code has been changed to use array/mass SQL deletion to replace a repeated executed single deletion to delete records from database table. After the change, the business job runtime has over 90% improvement. This post would cover

  1. The background of SAP ABAP program performance tuning,
  2. The solution of SAP program performance tuning and
  3. The result of SAP program performance tuning.

1. The background

Volume of a business process is going to increase. To prepare for higher volume, we are requested to check whether the program performance can be further tuned. For that purpose, the business program is tested and being traced via SAP st12 transaction. Following (Figure 1) is a part of SAP ST12 ABAP trace screen showing top hit list based on “gross %”. And Figure 2 is a part of SQL summarized screen of the same execution.

Figure 1 SAP ST12 ABAP trace – many execution of SQL deletions

Figure 2 SAP ST12 SQL Summary

Clearly from above trace, we can see improvement of this program can only come from changes which can reduce database time which counts for 99.4% of runtime. And database time spent by the program is from 3 SQL delete statements which are from functional module /Ortec/TLO_HLP_SSCR_VAR_DELETE which are called twice by ABAP form f_completed (see Figure 3).

Figure 3 ABAP source code

What is the problem here? What is the solution?

The solution for ABAP performance improvement

Based on tips from “SAP ST12 Trace – SQL performance analysis”, an expensive SQL statement can be reviewed from following area to see whether this can be improved –

  • Check database table access strategy.
  • Review identical access.
  • Review table buffer strategy.
  • Review number of SQL execution.
  • Review data returned.
  • Review table data volume management strategy.

In this particular case, we can see the reasons on why those 3 Delete SQL statements are so expensive based on Figure 1 and Figure 2 – it is due to over 3,600 executions. The each execution of sql is efficiency – in average, it took less than 1.2 ms/deletion (based on ABAP trace and SQL summary). There is no identical selection. Database is using the primary index to execute 3 SQL delete statements whose SQL where-clause match primary index. So this is not a table access strategy and identical selection issue. Based on business process, the corresponding table is changed very often, table buffering is not applicable. The table size is not relevant here as well since it is deleting record via primary key and table size is not big. So it looks like that we need to review number of SQL execution to see whether it can be consolidated.

When we review the source code (see Figure 3 above ), it is found that Functional Module has a simple task to delete tables using a single value and the “F_completed” form which calls the FM is called from a loop. The FM is a 3rd party code. So the proposal was given to developer to change the program logic to replace individual deletion with mass/array deletion.

Based on the above input, the program logic was changed. The ABAP form f_completed is not called in the original loop. Record for deletion is collected and stored in a program internal table in the original loop. The ABAP form is being rewritten and it is using mass database deletion based on the internal table instead of calling 3rd party code ( Figure 4 ).

Figure 4 ABAP mass deletion based on internal table

So what is the result of those changes?

The result of ABAP performance tuning

Following Figure 5, Figure 6 and Figure 7 shows when above changes was implemented in our production, job runtime and job table access performance comparison. In this particular case, this simple changes have made up to 98% performance improvement after changes was moved into production on May 25.

Figure 5 ABAP version management – show date when mass change is in

Figure 6 Job runtime comparison – before and after performance tuning

Figure 7 STAD – table access comparison before and after performance tuning

Figure 8 Monthly resource utilization under individual deletion

Figure 9 Monthly resource utilization under mass deletion

Figure 8 and Figure 9 show monthly resource utilization of the same program before the change and after the change – that is over 150 hours database time saving with the mass deletion.

Further clarification

Purely from runtime point view, the job can finished in 4 minutes prior to performance tuning. We tend to think that no much performance tuning opportunity exists for such faster program. This case told us that the fact that a program finishes faster does not mean the program is performance perfect unless it has been gone through performance testing or designed/coded by professional who is expert at performance. That a program runs faster might be due to lower volume or simple processing not attributed from sound performance design/code.

Reward from tuning a quick running program depends on frequency a SAP is executed. Priority of tuning a program depends on business performance requirement and resource footprint of the program. Performance tuning normally focus on a program which cannot meet business runtime requirement and it is using a lot of system resource.

Performance tuning can be an iterated process. If you would like to know more, please click SAP ABAP program performance tuning process.

SAP ST12 Trace – SQL performance analysis

This post is to continue my writing on how to tune a SAP program/application performance based on SAP ST12 traces. My previous post is writing about overall SAP program performance tuning process and how to analyze ST12 ABAP trace to improve SAP program/application performance. Here, I would cover how to analyze SAP ST12 SQL trace to tune SAP program/application performance from development point view.

1 SAP ST12 trace – SQL analysis

You might wonder where you should start to analyze SQL statement performance. My choice is to use “Summarized SQL Statements” of SAP ST12 performance trace and focus on top expensive SQL statements. Please click here for how to navigate to “Summarized SQL Statements” screen from SAP ST12 performance trace.

1.1 Understand data in “Summarized SQL Statements” screen

Following is a portion of SAP “Summarized SQL Statements” screen from a SAP ST12 performance traces. The displayed is sorted by total execution time or duration of a SQL spends during tracing period in default.

 

Figure 1 – SAP-PERF.CA

Before you analyze SQL performance based on “Summarized SQL Statements” screen, you need to make sure that you understand the data presented in the screen:

– Number of times which a SQL is executed by the program during the trace.

– Number is in percentage which of all table read are identical. Two reads on database table are identical, this means that the same program read the same record two times. So as long as the SQL statement where clause is the same on the same table even they are at different location of the program and different fields are returned, they are still identical. In the figure-1, the second line has a value of “1,200” and “58” in column “execution” and “identical”, this means that 696 of 1,200 times when SQL are executed is identical. 1,200 X 58 / 100 = 696.

– Duration is sum of each individual SQL execution time for number of times under “executions” column. Records are total # of records which database server returns to application server where your program is running.

– Average time per execution for the SQL = duration/Number_of_execution. Average number of records retrieved from database table per execution of the SQL = Records/executions.

– SAP table buffer type. There are no buffer, single record buffer, generic area buffer and full table buffered. Blank means that table is not buffered.

-SAP transparent table/pool/cluster/view name. When it is a cluster table or pool table in source code, “Summarized SQL Statements” would show cluster or pool name instead of the table name. For example, in SQL, you can get change header or item information from table “CDPOS”, in ST12, it would show CDCLS under Obj. name column instead of “CDPOS”.

– This is a “shorten” SQL statement from database SQL cache. You can see full SQL statement by double clicking the SQL statement. You can select the SQL followed by clicking on the in figure-1, SAP would then show you the SQL execution plan which contains full SQL. Several ABAP SQLs can be mapped into one SQL statement in the summary windows. You review the related ABAP source code by clicking in above figure-1 after you place cursor on the line you are interested.

At this moment, I would like to clarify several things further to avoid possible confusion on data displayed in “Summarized SQL Statements” screen showed in figure-1.

Number of execution, identical selection and Number of records returned in “Summarized SQL Statements” is from database server point view not application/program logic point view. For example, Select-for-all SQL statement is just executed once from ABAP program logic point view, but number of execution in “Summarized SQL Statements” screen might be more than 1 depends on number of records in the ABAP internal table and system setting. Also, number of records returned by the database for SELECT-FOR-ALL statement could be more than number of records which the program see since the database interface in the application side would remove duplicated records before data is passed to the program.

Number of records which a program sees could be different from what database server actually processes as well. For example, when database server does a “SUM” operation, it just returns one value/record but database might need to get all needed records to get the total result.

All time here is measured in Microseconds and measured from application server point view, so the database time includes impact from network traffic time and IO system when applicable. There are general guidelines to see whether a database data operation( sequential read, direct read, insert and commit ) is impacted by database performance issue(including network and IO) based on average response time. However network and IO normally have a broad impact on a SAP system performance instead of cherry-picking one specific program.

1.2 Analyze Top expensive SQL operation.

Following is the steps I used to analyze Top expensive SQL statement based on SAP ST12 SQL trace for tuning SAP program performance.

  • Check database table access strategy.
  • Review identical access.
  • Review Table buffer strategy.
  • Review number of SQL execution.
  • Review data returned.
  • Review table data volume management strategy.

All above checks/tests, if you do not like term “steps”, can be needed for analyzing one SQL statement. Or only one check is needed for a particular SQL statement. All depend – your knowledge on the business solution, the program, testing case, and the SQL.

1.2.1 Check database table access strategy

This step is to review SQL execution plan to see whether an index or right index is used to read the table. So we can tune SAP program performance by speeding up data retrieval via an index or more appropriate index.

First, you got the SQL execution plan from “Summarized SQL Statements” screen and index details

Figure 2 – SAP-PERF.CA

You can get index details by double clicking on the index name or get all indexes by double-clicking the table name in the execution plan.

Then, you review and analyze execution plan by cross-checking between SQL where-clause and execution plan. You also need to check SQL statement in execution plan and corresponding SQL code in the ABAP program. You can have following result

1. Index is not used and database is using “full table scan” to read database data.

This could be where-clause of ABAP SQL code has no selections fields matching fields of any existing index. Or ABAP SQL where-clause(selection criteria) has one or more index fields but during the execution, the program passes no data to those index fields. Please notice that field sequence in where clause makes no difference on execution plan for Oracle CBO engine.

To tune SAP program performance in this case, possible solution can be:

  • Use alternative table for the information.
  • Change fields of where-clause to match existing index.
  • Change program to make sure that data is passed to index field(s) in where-clause.

2. Index is used but not correct.

This could be due to complex where-clause in SQL statement. I am not talking about wrong index choice related to table or system statistics or database setting, which requires no ABAP effort to tune SAP program performance.

To tune SAP program performance, you need to simplify the where-clause. Or use corresponding database hints in ABAP code to influence database CBO index choice for this particular SQL.

3. Index is used but other indexes might be better.

This could be due to the fact that table has several indexes and those indexes have common fields where it is referenced in SQL statement.

You need to use corresponding database hints to tell database that what index should be used, you can review and change table index design or change where-clause to let database to use preferred index.

4. Index is used but index itself is not selective.

This is normally due to index design. This happens usually to local table or local developed index.

To fix this, you need to redesign the index using “selective” table fields and arrange index fields properly with consideration of query request and clustering factor. What is selectivity of a field? It is a measurement against two data: Number of distinctive value for a field and number of record in that table. The higher number of distinctive value for a field, the more selective the field is.

Also selectivity of index and selectivity of where-clause might not be necessary the same. Sometimes, an index is not selective but the where clause can be very selective – data histogram can play a difference. For example, if a table has 1,000 records, one field has only two distinctive A and B. If B has 999 entries and A has only 1 entry, then it is very selective to select entry from the table with condition of values equal to “A” assuming the field is the only field of an index. Querying the table using “B” value is not selective.

Now I would like to mention “estimated cost” in the execution plan, the higher it is, the more expensive the SQL is. For CBO (Cost Based Optimizer), the database server always uses an execution plan which has a lower “COST”. But Lower cost does not mean an index is always better than another index. Discussion on this is not focus of this posting.

1.2.2 Review identical access

This is to understand why there are identical accesses in “Summarized SQL Statements” to see whether we can eliminate identical access to tune SAP program performance. You review identical access by reviewing the SQL source code and SQL statement in the execution plan. It is possible one SQL statement in “Summarized SQL Statements” screen can mapped to more than one ABAP SQL statements at different program code location.

1. More than one SQL statements from the ABAP program are mapped to the SQL statement.

2. Only one SQL statement from the ABAP program is mapped to the SQL statement.

To tune SAP program performance, you can consider following options

  • Consolidate the similar SQL statements. If you need two fields from a record at different program location, it is better to retrieve two fields at one location instead of doing it separately at different program location from program performance point view.
  • Move SQL statement to more appropriate location – like move it outside of loop, put it at document head-level instead of line-item level and implement at expected higher organization level like company instead of repeating it at every lower organization level like plant.
  • Use program level buffer – retrieve data then store it for later reference in the same transaction to avoid database table access. There are several SAP ABAP techniques to achieve this.
  • For SELECT-FOR-ALL SQL statement, you can sort the internal table and remove “duplicated records” before the internal table is used in the SQL statement to retrieve data from a table.

Identical access sometimes can represent a significant performance tuning opportunity – because related program unit (or business function ) might only need to be executed once but executed many times due to improper design or coding. In Figure-1, if we assume that time for each access is same, we can make about 58% performance tuning on 2nd SQL by just removing those identical access.

1.2.3 Review Table buffer strategy

This is to review table buffer status based on “Summarized SQL Statements” screen to see whether we can enable buffer or deactivate buffer to tune SAP program performance. You also can review whether the SQL statement has bypassed SAP buffer based on code and whether the business logic needs to bypass SAP buffer.

Which SQL statement would bypass SAP buffer? Following is the list:

  • Any SQL select statement with a keyword “bypassing buffer”.
  • Any Select with a sub query or with joins
  • Any “count” function ( Count, Min, MAX, SUM, AVG)
  • Group by
  • Select DISTINCT
  • Order By
  • Select for update

Click here for SAP document and examples on ABAP SQL statements bypassing buffer.

If SAP program should not bypass buffer when it access table, then you need to change SAP SQL code so data requested is from application buffer instead of database table.

Only SAP configuration table and parameter table are appropriate candidates for buffering if the table is “seldom” changed and has an “appropriate” size. What is “seldom” and what is “appropriate” table size, there is really no hard cut-off line. The general guideline is we can buffer record if record change rate is under 1% and table size does not exceed 5MB. You might need to select “right” buffer mode based on table size, changes and type of query.

If a buffered table is now changed more frequently, this can impact your program performance as well due to buffer synchronization. In this case, you need to disable the buffer.

I might write a post to share my experience on dealing with table buffering. You can click SAP table buffering to know more in this area.

1.2.4 Review number of SQL execution

Here we review Number of execution of SQL in the summary screen to see whether we can tune SAP program performance by reducing number of execution on the SQL. Further by reviewing number of execution, it might be discovered that a subroutine might be executed more than what is really needed! To reduce number of SQL execution, you can consider similar solutions mentioned in 1.2.2 section of this post.

If there are individual Select, “Insert” or “update” on a table, it would help to tune program performance by using mass operation like mass insert and mass update.

1.2.5 Review data returned

This is to review what information each record contains and how many records are returned by the database server to ensure that only data needed by business operation is returned from database not more and not less. Eliminating un-needed fields of each record can result new execution plan like getting data only from an index and reduce round trips between application server and database servers. Reducing number of records returned by database by building more specific retrieval condition into SQL where-clause is better than retrieving records from table then filtering records at program level.

1.2.6 Review table data volume management strategy

Number of entry in a table has a big impact on database time used to search a record when the table primary key is not being used to search a record. It is clear that index or not index would make no performance difference if the table has only several records in the same way that using a calculator or not make no time difference to calculate 1+1. So we need to understand how long a data should be kept in a table to meet business requirement and how long the data is actually kept in the table. If this understanding would result in “significant number” of records being removed from the table, then the SAP program performance would be improved greatly even there is no code change.

Last but not least, you also can review top expensive SQL statement to see whether they are related and can be combined SQL on different tables via join and/or database view – this can help on the SAP program performance as well.

2 Clarification

When I talk about improving SAP program performance, I mainly cover this from application side on how we can improve ABAP program code (ABAP and SQL) and ABAP program/business solution design.

ST12 has combined features from SAP transaction SE30( ABAP trace ) and ST05(SQL trace). So what I stated here is applicable to ST05 trace(SQL) analysis.

Whether you should analyze both ABAP and SQL traces or just one of two traces, this depends on your situation. If your program spends 99% of time on database side, then you should focus on SQL performance analysis to tune SAP program performance.

I am not talking about ABAP program performance from system/network point view like table statistics, table/index storage status(fragment, allocation) etc. So you need to make sure that performance issue is not due to system(Capacity, OS, DBMS, IO, Network etc). Please refer to performance introduction for details on what influences program performance. If all other programs are working fine in a system except your program and performance of your program is “consistent”, this can normally means a code/design issue. If your program performance is becoming worse due to volume increase over a long period, the performance issue should be related to design and/or code of the program. There are “general” criteria to say whether database/storage performance can be a concern.

I am not talking on how to deal with a one-time SAP program performance incident where program runtime is deviated greatly from normal range –That is a performance incident trouble-shooting.

Last not least , you might need to search SAP OSS note for possible solution especially when a standard SAP program has performance issue which is not due to system resource or database decision(like Oracle CBO etc).

Why would SAP job runtime suddenly jump well above normal range?

Business was complaining that runtime of a periodic SAP background job was suddenly jumped in production environment. There is no code change and no volume change and other jobs and transactions are running well. They were asking why this happened. I was consulted and looked into the situation. Finding the issue was related to Oracle execution plan change – Oracle chose a suboptimal execution plan. The plan changes was triggered by a regular table statistics update. Job run time was back to normal after plan was switched to better one following restoring previous table statistics. This blog would focus on how I trouble this particular SAP job performance issue. It would talk:

  • The performance issue – Job is running up to 10+ times longer than it used to be,
  • The performance trouble-shooting – why this job is long running?
  • Going deeper – Why is the execution plan not efficient?
  • Fix performance issue – action and result,
  • What we learn from this case? and
  • Further clarification.

Please continue read should you be interested in details.

1 The performance issue – Job is running up to 10+ times longer than it used to be

A job which runs every 15 minutes used to take less than 300 seconds finish was taking at least 600 seconds and up to 2,300 seconds to complete. Please refer to following screen shot to get more understanding on our issue.

Figure 1 SM37 job log – runtime history

Figure 1 shows that job run time had a jump since 00:15:02 on Apr 24 2015. There is no code change, no volume change and system has no resource contention… Why would the job run time have such a huge jump?

2 The performance trouble-shooting – why is this job long running now?

Here, performance trace was done on the job to find out where the time is spent by the program. The performance trace shows that job spent most of time on executing one select-SQL statement, and SQL summary for the trace indicates that number of records returned by the SQL is not high, this is unlikely a volume issue, so I examined the SQL execution plan – identify the hash join used by Oracle might be a bad choice, and checked SQL execution plan history and found that the timing of execution plan change is collated to timing when job started to have performance issue. Further I found that table statistics of one of the underlying table was updated immediately before the moment when the execution plan was changed. With that, I concluded that job runtime was due to table statistics update which led to reparse the SQL and Oracle changed SQL execution plan and used a suboptimal execution plan.

2.1 Where is time spent by the long running job?

Performance trace done showed that job was spending 99.9% of run time on database operations as below.

Figure 2 ST12 ABAP trace – why would job runtime jump

99.6% of time is on single SQL statement “Select VTTP” based on above trace. So this SQL is responsible for job long run time. Why was execution of this SQL taking so long ? ( Tried 3 times, can not upload my figure 2…)

2.2 Why is this SQL execution running long?

Is this due to inefficient index or due to a lot of database entries are retrieved? We need to check that with SQL trace. Following is SQL summary from SQL trace.

Figure 3 SQL summary – why would job runtime jump

From above trace, we know number of records fetched from the database by the top expensive SQL is “0”. So it is not due to a lot of records are fetched and the SQL is executed once. Is it due to wrong index used etc.?

Now let’s review the execution plan…

Figure 4 sql execution plan – why would job runtime jump

The index used looks ok in Figure 4…but when I compared number of records (0) returned by the SQL in Figure 3 and number of records(7,971) projected by Oracle during plan generation, I smell something wrong… This looks like an Oracle execution plan issue. Since this job was working well, this must be due to recent Oracle execution plan changes of this particular SQL. Is that true?

2.3 Was the SQL long running due to oracle execution plan changes?

 

Using Program /sdf/RSORADLD_NEW( there are SQL Scripts available for this purpose), I got SQL plan history as Figure 5 (right side).

Figure 5 TIming of job runtime change and timing of execution plan changes

Figure 5 shows that timing when execution plan was refreshed and timing when the job started to run long are closely related. It could be that Oracle reparsed the SQL statement and refreshed the sql execution prior to the sql was executed. But Figure 5 does not tell what the execution plan was used to execute the job prior to Apr 24 2015.

Following load history data shows that the execution of this SQL were captured on Apr 24

Figure 6 SQL load history – why would job run time jump

The job is executed every 15 minutes and every day. Yet the load history of this particular SQL was only captured on Apr 24. In our SAP system, we follow normal approach – only top expensive SQLs are captured into the snap-shot. So this means that this particular SQL was NOT one of top expensive SQLs prior to Apr 24.

2.4 Why would Oracle plan change execution plan?

Following screen shows that table VTTK statistics was updated at 00:08 on Apr 24, 2015. That timing is exactly linked to the timing when the execution plan was changed.

Figure 7 table statistics update – why would job runtime jump

 

There are other reasons which can lead to refreshing of Oracle execution plan. But this case, the execution plan refreshing is a result of Oracle table statistics update. And refreshing of Oracle execution plan led to a suboptimal plan in this case – leading to job runtime jump.

2.5 Conclusion on why job is long running

Job was running long after 00:15 on Apr 24 2015 was due to the fact that job is spending significant more time on executing one Select-SQL statement. Execution plan of this particular SQL was changed/refreshed at 00:16:53 in the same day. VTTK table statistics was updated a few minutes earlier at 00:08 in the same day, Table statistics update led to reparse related SQL statement – that is a typical Oracle response. New execution plan as a result of statistics update is not efficient as the original one. That made job run longer.

So why is the new plan not good and what is the better plan? I would try to take a closer look on the execution plan.

3 Going deeper – why is the execution plan not efficient?

In this section, I use sample values for the binding variables, number of table entries and st04 SQL cache data to validate the SQL execution plan which is in question.

In previous section, it is noticed that hash plan is used in several places. Hash operation is normally for data operation on two dataset when a lot of records are returned. Let’s take a peek on Oracle binding variable used in this SQL

Figure 8 Value for Oracle binding variable – why would job runtime jump

A4 is field vttk-shtyp and A5 is vttk-fbgst, following are total entries of VTTK and entry which is meeting selection criteria (Figure 7). Review table definition, FBGST is a field to store overall status of shipment and shipment stage. Value “A” for that field stands for “not processed” status. So the job looks like only interested in “new” shipment. Based on load history and SQL cache, we know that execution of this SQL returned “0” records most of times.

Figure 9 SQL cache – why would job runtime jump

Following screen shots indicates number of records from VTTK table which is meeting the selection criteria..

Figure 10 table size and number of records – why would job runtime jump

 

In the current execution plan, since hash operation is used, Oracle would prepare two dataset first prior to apply hash operation. When one data set has no or small number of records and the other data set can be read via efficient index, nested-loop operation is more efficient than hash join. So hash join operation at Step 7 in Figure 7 is not efficient as nested-loop. I did check the execution plan of this SQL in our testing environment. The plan was different from our production box. I put both plan side by side as Figure 11.

The plan in the testing box is better, when no records is returned from hash operation using VTTK and VTTP, the reading on lips would not executed via Nested loops operation.. so reading LIPS is spared.

Figure 11 SQL execution plan comparison – why would job runtime jump

Execution plan major difference is highlighted between test box and production box. It is mainly at step 4 where testing box is using nested loop while production box is using hash join. Due to hash join, production has to prepare datasets from LIPS and VTTP tables via another hash operation which is expensive based on selection criteria used and table size. While in the nested loop, step 9 and would be spared when there is no records. You might agree with me, If oracle replaces all Hash operation with nested-loop here, the performance could be even better.

4 Fix the performance issue – action and result

The table statistics was restored to the version prior to the problematic updating… after that, the execution plan in our production box is changed to the plan we are seeing in our testing box. The job performance is back to normalJ

Figure 12 Job runtime back to normal with restored execution plan

Figure 13 plan history after restoring statistics

Plan history shows that restoring statistics changed production plan to the same as what we are seeing in test box.

5 What do we learn from this case?

An unexpected SQL execution plan change could be very likely the reason that a job runtime jumps especially when following conditions meet

  • Other jobs or programs are running as normal.
  • Job spends most of time on database side or database time has a significant increase – STAD or ST03N,
  • Most of database time spent by the job is due to database read operation – STAD
  • There is no code change or job variant or volume change.

You can validate whether there is a plan change by checking plan history and

  • Check whether timing of plan change matches the moment when job runtime starts to jump

Oracle regenerate SQL execution plan when underling table statistics is updated. Table statistics updating is the typical reason which leads to an execution plan refreshing. You can validate the timing of statistics update and timing of execution plan change. SAP application performance issue related to table statistics change can be fixed by playing with table statistics to influence SQL execution plan.

6 Further clarification

Oracle SQL plan management can prevent unexpected plan changes to avoid SAP application performance issue due to accidental execution plan change. Many factors can influence a job run time like inappropriate code change, lock contention with other jobs etc.

In this particular case, a performance trace was done to understand the performance issue. But actually, there is no need to do performance trace to fix the performance issue. SAP tool STAD, ST04 and tools based on Oracle AWR should be enough to trouble-shoot this type of performance issue. I am writing another post on how to use SAP STAD. In that post, I would give some insight on how to get this doneJ

It would be nice for us to know the execution plan of this particular SQL used to execute the job before Apr 24 2015. Apparently this was not possible in our system. ST04 and Oracle AWR tools in our SAP system does not show us complete plan history. Oracle plan history table does not contain data to show us what the plan was before the issue. What could be the reasons? At this moment, I am guessing this might be related to our setting on how many SQL should be captured into AWR. Not quite sure of that.

 

What types of SAP performance analysis can we do using SAP ST03N transaction?

In my previous post, I talked about how to do navigate through various analysis views of SAP ST03N transaction. In this post, I would give an introduction on typical types of performance analyses which I have used SAP ST03N for in my work.

  1. Who is placing workload into a SAP system and when/where/how is SAP workload generated?
  2. What is SAP system/application performance trend?
  3. Are there any potential resource bottle in processing workload?
  4. Trouble-shooting SAP performance incident.

1 SAP system workload – who, who, where and how

SAP ST03N cam tell you – who are generating workload in a SAP system, when/where/how are those loads generated in the system for the selected period – last minutes, current period (current day/week/month) and past period( a day, a week or a month). Where here means which SAP instance when a SAP system has multiple instances. Using those information, you can do a lot of performance analysis.

1.1 SAP workload distribution among different SAP tasks

In a SAP system, there are different tasks – Dialog, Background, RFC, ALE, Update task etc. If you are wondering how significant each task is in terms of CPU utilization? ST03N can answer such question. You can pull data from SAP ST03N workload Overview and visualize SAP workload distribution among different tasks via a chart similar to following Figure 1. The pie chart reflects load distribution of a SAP S2C production system.

Figure 1 SAP system application server CPU utilization distribution

The above chart clearly indicates that who (background process) accounts for 60% of CPU utilization in the elected period. In the similar fashion, you can get chart to see database time distribution among different tasks. So you can quantify level of resource which is consumed by each of SAP SAP tasks. This type of information is available at system level or server/instance level in the selected period.

1.2 SAP workload distribution among different SAP instances/servers

In a SAP system which you have multiple instances or servers, you can use ST03N to find out what is their share of processing SAP workload. You can pull data from ST03N Instance Comparison view and visualize SAP workload distribution among different SAP servers/instances via a chart similar to following Figure 2.

Figure 2 SAP workload distribution among SAP Servers

Figure 2 clearly shows SAP workload distribution among different SAP instances. So you can see whether there is a load balance issue overall in the elected period. It is expected that every server in a system should process similar amount of workload assuming all servers has identical capacity and configuration. Figure 2 shows Server 2 is handling significant more load than other servers. So next step of analysis is to see understand SAP tasks and their CPU consumption in server 2.

1.3 SAP workload distribution among different hours

If you are wondering how workload is distributed hourly in a 24-hours window, SAP ST03N time profile can help you. This can tell you overall when the system is most busy and when the system is not. This might be helpful if you need to find a time window to let your system to handle some unplanned workload or you would like to check whether you can move an unnecessary peak load to an off peak period to protect SAP performance.

Figure 3 SAP workload – hourly distribution

SAP ST03 or ST03N provide workload hourly distribution on three intervals – daily, weekly and monthly period. This would give you a better picture on SAP workload distribution.

Level of workload in a SAP system is related with business operation which is often dynamic and keeping changes, so it is normal that workload would vary from hour to hour. However if type of workload analysis detect that there is a clear pattern – for example, one particular server at one particular hour is always having much higher workload than other hours and other server, then it worth next step to look into this in details to see whether we easy the workload “hot” spot.

1.4 SAP workload distribution among different transactions and programs

Based on data from SAP ST03N EarlyWatch profile, you can know how system load is distributed among different sap transactions and programs and know what are most expensive transactions and programs in your system.

Figure 4 SAP workload distribution among different transactions and programs

This would be value input to help to choose which transaction we should focus on for general performance tuning. SAP ST03N can help you to find out whether a transaction/program has been ever executed in the interested period.

This type of SAP workload analysis can show you a list of top work load contributors. This should help you if you would like to identify a list of a programs/jobs for performance tuning.

1.5 SAP workload distribution among different sap users

Based on data from SAP ST03N User profile, you can visualize top expensive workload users and produce chart like Figure 5.

Figure 5 SAP workload – CPU time distribution among users

This would provide workload distribution at SAP user level regardless of individual business process and application. You need to review this type of data if you suspect that particular user might occurs more load than it should…

SAP ST03N can help you to find out whether a user is “active” in the interested period and what transactions and programs are executed by a SAP user in an expected time window.

1.6 RFC work load and load from external analysis

SAP ST03N provides detailed data to show you what programs/transactions are generating RFC load in the system, what users are generating RFC load in , what are those external systems which are placing RFC load in the system. So if RFC load is a significant portion of your system and you would like to analyze this, you can use data from SAP ST03N RFC profiles and “Load from External Systems” view to do further analysis to identify top contributors. Based on that, you can decide whether there is a need to take further action like performance trace and analysis etc.

Figure 6 SAP workload From External System

1.7 Load profile of individual SAP transaction/report

Performance tuning is normally to cut time, so it is important to know where time is spent by a transaction/program/job. SAP ST03N can show where time is spent by a transaction/program/job – it is spent on most of time on application server side – ABAP logic operation, spent on database operation etc. Please pay attention that this is an average data for the specified period you choose when you run ST03N. In the period, the transaction/program might be executed many times. So it might be different from time statistics of a specific execution.

Figure 7 SAP workload – transaction time statistics

Data similar to Figure 7 gives a general direction on where you should focus on tuning application performance. If a program spends 99% of time on CPU, you might need to focus on the ABAP logic instead of tuning SQL statement of this program. In Figure 7,1,024 ms out of 1,355 average response time is spent on GUI side. Since VA03 is a sales order display transaction, this could indicate a network issue or front-end issue if user is complaining of performance issue with VA03 which is normally performing normal.

2 SAP system performance trend – system, transaction

If you wonder how system performance/application performance evolves over time or you wonder how a SAP system is performing after a significant change to the SAP system like upgrading, migration, added resource, SAP ST03N can help you with this.

2.1 What is performance trend of a system

You can compare average response time per transaction step for main task type in several periods to see overall system performance trend.

Figure 8 SAP system performance trend after migration

Figure 8 is a true production data after an ECC system migration. In this particular case, the corresponding SAP system is moved from Itanium/HPUX to x86/Linux Server. Based on Figure 8, a performance improvement are seeing across all major SAP tasks after X86 migration. Correspondingly, all key business jobs we checked in our SAP environment are showing a significant performance improvement after migration. When SAP system performance trend analysis shows a trend of deterioration, then further review on runtime’s component like CPU time, database time etc. are needed to identify further analysis action.

2.2 What is performance trend of database system?

Average time taken for database access – like sequential read, direct read and change operation etc. can be a meaningful indicator for overall database performance. You can visualize database operation by producing chart to similar Figure 9.

Figure 9 ST03N workload – database operation performance

Figure 9 is a true example to show the SAP MDS operation’s impact on background task in our production in our X86 migration project. We were seeing slowness in many transactions and programs in MDS drill 1 period. One particular process was so badly impacted resulting in missing service level agreement due to implementation of SAP MDS migration solution. For more details, you can refer to my post on SAP MDS Migration and system performance.

Need to know more? check book – Expert Oracle Database Architecture

2.3 What is performance trend of a transaction/program/job?

Transaction profile view of SAP ST03N shows average response time per transaction step in 3 type of interval – daily, weekly and monthly. You can review average response time per transaction step for a specific transaction over a long period. This could give you an indication of SAP transaction/program/job performance evolution – could be stable, or be running longer and longer, be varied but within a specific scope etc. If program is running longer and longer, then action might be needed to see what drives growth of runtime.

Figure 10 SAP workload – performance trend of individual transaction/program

Resource bottleneck analysis

3 Resource bottleneck analysis

3.1 SAP work process utilization analysis

Several views of SAP workload monitor ST03N contain a column named as “Average wait time per dialog step”. Following screen shows is from ST03N workload overview

You can check wait time at system and instance level for a specific day, week and year. You also can check “wait time” in a 24-hours fashion. Normally, wait time should be around a few millisecond. Wait time is to measure how long a job or transaction has to wait until the system can find a free SAP work process to run it. So consistent high wait time can indicate that no enough SAP work processes is configured in system or there is no enough CPU or application has occupied work processes too long due to application design if no issues like storage, network, database issue etc. put SAP work process into “abnormal” or “error” status. You can refer to my post on
SAP work process monitor
for more information on status of a SAP work process.

A great book on SAP performance – SAP Performance Optimization Guide: Analyzing and Tuning SAP Systems, SAP Basis, SAP Administration

3.2 Other resource bottle-neck analysis

SAP workload monitor calculates “Average Processing Time” and “Average CPU time” per transaction step. This can be used to monitor “wait” situation – such as wait for free system resource – CPU, SAP memory, programmed wait/sleep due to data object lock, wait for establishing communication channel e, wait for completion of RFC etc.

CPU time is consumed during the processing time. So length of “Avg. CPU time” is related to length of “Avg. Proc. Time”. If a transaction is making “no RFC” call and no program coded “sleep” or “wait” statement, then “Avg. Proc. Time” should be closer to CPU time and should be less than 2 times of “CPU time”. If this relation is broken, then this indicates a shortage of CPU normally.

SAP ST06 can be used for CPU load analysis as well, you can check my post – How to use SAP transaction ST06 for SAP performance analysis for details.

4 Performance incident analysis

SAP ST03N can be used to trouble-shoot system/application performance issue by comparing system/application performance:

  • between the period when performance is bad and other periods when the performance is good.
  • between the user(s) who is complaining of bad performance and who is having good performance.

This performance comparison can help you to pinpoint why a program/transaction is having performance issue or help you to identify the direction of performance trouble-shooting.

Our ECC system is feeding SAP “BW” system. Once there were many idocs piled up in ECC side waiting to be sent to BW system. Using ST03N, comparing the period when performance was good with the period when performance was bad, it was found that BW side was slow in loading ECC IDOCs. Further investigation found that this was due to Oracle compression bug.

5 Further information

This post is mainly from my understanding and my work experience. It is impossible for me to list all types of performance analysis we can do with SAP ST03N… Someone might find response time distribution view of ST03n is useful, one of SAP system performance key indicator is to measure percentage of dialog step which are able to complete in single seconds. My next post on SAP ST03N is to talk about ST03N RFC profiles, for example, how to understand information of various ST03N RFC profiles.

.

SAP migration and performance – Would use of SAP MDS migration tool impact application performance?

My customer needs to migrate their regional ECC systems from Unix-based systems to Linux-based system. Some of regional systems are very big and their database sizes are over 10 Terabytes. To reduce system down time and mitigate risks of system migration cutover, the customer decided to use SAP’s Minimized Downtime Service (MDS) to migrate those ECC systems. I am the single point of performance contact for the migration project. If you are going to use SAP MDS to migrate your SAP system and wonder

  • Whether SAP MDS would impact SAP performance and how big is the impact?
  • What can do to mitigate negative performance impact of SAP MDS.

You might be interested in this post.

1 SAP MDS Brief

When a SAP system is down, it would not be able to process business transactions. So it is important for business operation to reduce migration downtime. SAP MDS is used to meet such customer need by migrating majority of data if not all from source system to targeted system while the source system is under normal business operation. In a high-level, SAP MDS would take a clean database copy while SAP system is up, It would take some time for SAP migration tool to migrate the database copy to targeted system, meanwhile, “new” changes (new records or changes on existing record) from ongoing business operation are generating, new changes are captured via database triggers, MDS would transfer those changes to the targeted system after completing migration of initial copy(clone) – this is called “synchronization”. In this way, when we need to switch from old system(source system) to new system(target system), number of database records needed to migrate to target system is very minimal if not all records are transferred, targeted synchronization level is 99% before downtime, so downtime needed to migrate a sap system from one environment to another environment is greatly reduced as well as associated risk. SAP MDS is an incremental migration solution.

MDS is using database trigger and log tables to capture incremental changes. MDS solution would create a log table and create a table trigger for each database table by default. Changes(Delta) on a table are captured via database trigger and recorded in its’ respective log table for later synchronization.

2 Would SAP MDS impact sap system/application performance?

It is stated that MDS would impact the Oracle overall performance in a minimal way. But in my experience, we do see negative performance impact both technically and end-business impact after SAP MDS is enabled in our system.

2.1 SAP MDS – negative impact on system performance

Based on database performance data technical analysis, MDS solution does has impact on overall system performance. Database triggers were enabled in our production system on Sep 6 as part of MDS solution. Following chart is showing weekly average response time for each sequential read and change(micro seconds/operation). It clearly shows that there is a database performance deterioration after MDS is active.

How would this impact business transaction run time?

2.2 SAP MDS – Negative impact on application performance

Following chart is screen copies of transaction profile from two weeks – Week of Aug 11 – 17(left) when SAP MDS was not enabled and week of Sep 8 – 14(right) which MDS was active.

From above chart, you can clearly see that database time per transaction step in MDS week is noticeably higher for top 12 reports/transactions – this contributed to longer response time per transaction step for all top 12 reports/transactions except for transaction VA01 and ZVOMH. For VA01, average database time is higher in MDS week but offset by much lower CPU time which leads to better response time in MDS week. Better performance for ZVOMH in MDS week might be due to much lower business volume – CPU time per transaction step in MDS weeks is 404 ms/step which is much smaller than 878 ms/step pre-MDS week.

In average, response time per transaction step is 10-50% higher in MDS week. Impact on individual jobs/program would vary since each program has different function, design and code.

You can use SAP transaction ST03N to do application performance comparison analysis such prior upgrading and after upgrading etc. You can use SAP ST03N transaction to get the transaction profile showed as above.

2.3 SAP MDS – Negative impact on individual background jobs

If a program/transaction runs 10-50% longer, would this impact business operation? This depends on gap between business performance requirement and program/job performance position. For example, if requirement is that a job has to finish in 1 hour and the job max duration is 10 minutes prior to enable MDS… Then even the job takes up to 59 minutes in MDS, it is just a technical impact on job runtime which has no business impact. Business might not even notice this runtime increasing.

Following table shows average run time and maximum run time for a list of daily jobs (they are executed once daily) in two periods of two weeks from our production environment..

Job Average Run-time (sec)

Maximum Runtime(sec)

Business background Job Pre-MDS (2 weeks) Post-MDS(2 weeks) Pre-MDS (2 weeks) Post-MDS(2 weeks)
R_XN_ON_IDOC_POST 1,207 1,459 1,802 1,871
PP_2SKU1_SPLIT1 467 624 1,552 2,877
PP_MPSMRP_NA 5,851 7,642 6,364 9,443
PP_ACTION_CNTL_FR_MRP 1,247 1,417 1,300 1,604
R_NA_BWINV_DFC_CLEAR 1,204 3,016 1,324 9,016
R_NA_BWINV_DFC_01 2,782 5,867 3,974 8,261

Above table shows that SAP MDS would make a job average run time 13% – 150% longer and make maximum job run time about 6 times longer! Jobs in the above table run in sequence – they are critical path of a job chain. Business has a cut-off deadline for the last job at the last row of above table. Since each preceding job was running longer after SAP MDS was active, accumulated effect on the last job is big – this resulted in a deadly miss to the cut-off time.

So now, you have seen data that SAP MDS can impact performance, how can we avoid this? What can we do to mitigate the negative performance impact from SAP’s MDS?

3 How can avoid SAP MDS’s negative impact on SAP performance?

It is stated that SAP MDS has to create database trigger for catching the changes and log tables for storing the changes. Each change would be stored in log table in addition to normal transaction table. And MDS would need to read the log table and transfer it to the target server (synchronization) via parallel processes.

MDS activities would use some amount of system resource. Make sure that your system has enough system resource (CPU, Memory, Disc space, network bandwidth) and sufficient SAP work processes to take care of MDS activities.

Prior to MDS, we should tune system setting to cate for additional objects and load from MDS:

  1. SAP memory tuning such as SAP Nametab buffer etc. catering for additional ABAP table object,
  2. Database buffer and redo/log space for additional table object and additional reading/updating on log tables,
  3. Identify top changes tables and their jobs,
  4. Remove the top changed tables from MDS solution – especially number of changes is high and table size is small. You do not want to remove a huge table from MDS solution – it might need more downtime to copy it over – against objective of SAP MDS,
  5. Design parallel process schedule used in SAP MDS data synchronization schedule according to system load. Changes are transferred to targeted system by using parallel processes. Number of parallel processes used in data synchronization can impact application performance. It is better to use less number of parallel processes in performance critical period while more parallel processes can be used in data transfer in non-critical period. For example, SAP MDS is allowed to use up to 10 parallel processes for data synchronization in critical period. 35 parallel processes in business hours. In other period, we allow MDS to use 45 parallel processes and
  6. Hourly monitoring was also built to monitor system based on pre-established performance KPI like log file synchronization etc.. So SAP MDS activities can be adjusted timely to avoid negative impact on performance.

From application point view:

  1. Review critical jobs and their performance position – identify potential victim of MDS activities,
  2. Take action to address potential performance concern on potential victim before SAP MDS is enabled in your system:
    1. Schedule change – start it earlier,
    2. Remove un-necessary dependency,
    3. Speed the process by breaking down the original volume and engaging parallel processing such as multiple jobs,
  3. Suspend non-critical jobs which are doing massive database changes and
  4. Monitor critical application performance closely after SAP MDS is active so potential performance issue can be detected earlier for action.

In the above case, the performance issue was reported in mock migration. We addressed the issue by removing one step out of chain since analysis indicates there is no absolute business need to build that job into the job chain. We also split the last step job into 3 jobs. Disable table trigger on the most-changed table by the program, redesign plan of parallel processes needed in SAP MDS data synchronization stage. …With those changes, the performance issue was addressed and the job was able to finish prior to the cut-off time during the 2nd migration mock as well as migration phase.

SAP System Log Review – SAP Terminal in status Disc

Recently I was involved in reducing SAP SM21 system log. One of top messages in our SAP SM21 log is “Terminal ##### in status DISC” & “Delete session ### after error ###”. About 20 such messages were generated hourly under a particular user. I looked into this and fixed the issue. I also looked into other SM21 log messages – “canceled transaction” and perform “rollback”. In this blog I would share my understanding on those errors and how those errors are addressed.

  • SAP SM21 log – “Terminal ##### in status DISC” & “Delete session ### after error ###” ,
  • SAP SM21 log – Perform Rollback ” “Canceled transaction” and
  • SAP SM21 log – “Canceled transaction”.

1 SAP SM21 log – “Terminal ##### in status DISC” & “Delete session ### after error ###”

1.1 Overview of investigating “Terminal in status DISC” & “Delete session after error ###”

“Terminal in status Disc” means a broken connection between SAP application servers and the frond end like SAP GUI etc. “Delete session after error ####” in this case, the error is 023. The error means execution of database operation is terminated before it can complete normally. In our case, HP quality center is used to monitor system status. VuGen scripts are executed from several servers to logon SAP to simulate online user transactions. The solution is to make sure that script is executed sequentially instead of parallel and let the script to wait for completion of SAP transaction before it exits from SAP. And that solution has fixed the issue. Details are covered in following sections

1.2 The SM21 log

Following is a combination of SM21 SAP system log screens and STAD transaction statistics screen for the same period – 01:00 – 02:00. Both screens are truncated. The SAP STAD is clearly showing that SAP transaction VA03 was repeatedly executed. 1st “Terminal 00397 in status DISC” was logged at 01:07:34. Prior to that, there were several VA03 executions without SM21 log message.

Figure 1 SAP SM21 Terminal in status DISC

Further reviews in other hours indicated that this SM21 log happened hourly but within an hour, it occurred randomly and logs are related to 5 different terminals.

“Terminal in status DISC” and “Delete session 001 after error 04” is one pair of message. “Deletion session 001 after error 23” comes alone. Please refer to figure 2 for sample.

Figure 2 SAP SM21 Delete session after error 023

1.3 What do SM21 log -Terminal in status DISC and Delete session 001 after error 023 mean?

Terminal in status DISC means that the SAP system is trying to send data to the terminal(client) but the terminal is DISConnected from SAP. This could be many reasons. Analysis of statistics records (SAP STAD) under user HPMONITOR indicated that va03 transaction was executed from several clients/terminals. Owner of this execution confirmed those va03 transactions are executed via VuGen(Virtual User Generator) scripts of LoadRunner. Analysis of statistics records indicated there was concurrent execution of those VuGen scripts. All execution of VuGen Scripts are using the same SAP dialog user account. All those scripts logon SAP under the same dialog user. SAP user license agreement does not allow sharing user account – or the same userID logons to SAP system twice at the same time. I think that concurrent logon via VuGen Script from different PC could contribute to our issue. Based on that, we re-arranged the execution of Vugen scripts to ensure that only one script at one time from one location is executed.

With above changes, the Terminal in status DISC log did not happen hourly any more, but it did happen occasionally on a daily basis. We had another error message “Delete session 001 after error 023” which was then number 1 message logged under the user HPMONITOR. SAP message on error 23 said “process restarted”.. really does not give me any direction, so I reviewed the developer trace. Following is the screen shot of Error message log and developer trace (access it via SAP transactionSM50).

Figure 3 Delete session after error 023 and SQL error 1013

From developer trace, we could see that oracle error 1013 is logged with the “Delete session 001 after error 023”. SQL error 1013 happens due to “user requested cancel”. ORA-01013 is “user requested cancel of current operation, This forces the current operation to end”. There is no cancellation command in the script and message was logged randomly with the execution of the same script. Why would execution of the script have such behavior?

I think this might be due to the fact that the script was sending a logoff signal to SAP system from the script while operation in SAP had not completed, Based on system status, the same operation in SAP could be faster or slower. The logoff signal from the script can arrive SAP faster and slower depends on network traffic.. so it looks like to me – this happened when logoff signal from the script arrived SAP prior to completion of previous step. It should help to fix this to add some wait in the Vugen Script prior to logoff. So the VA03 Vugen Script was modified by adding a few seconds wait prior to step of logoff.

1.4 Result

With above changes, the execution of those scripts generates no SM21 log any more in our system. As what is showed in following screen shot of combining of SM21 log( no logs under HPMONITOR) with execution of those scripts(STAD).

Figure 4 SM21 – Terminal in DISC – issue fixed.

1.5 How to reproduce SM21 message log – “delete session after error 023”

Logon SAP via SAP GUI from your Microsoft PC, Run se16 against a big table querying data without using key/index field, while it was still querying data, terminate your SAP GUI via window task manager, then you would reproduce the same SM21 log and developer trace showing SQL error 1013 like what is showed in Figure 5.

Figure 5 SM21 – simulate SQL 1013 error

Alternatively, I believe I can simulate the message “Deletion session ### after error 023” by terminating a running SAP work process in middle of SQL operation etc.

There are similar SM21 messages like “Connection to user, terminal lost”. The root cause is normally outside of a SAP system but related to specific user/program behaviors.

2 SM21 log – Perform Rollback

Here I would show how to trace back to the job/program which is producing SM21 log – “Perform rollback” or how you can find parent of SM21 message “Perform rollback” via an example.

Steps used to find the SM21 log’s parent — SM21 -> /sdf/mon -> SM37 -> WE02/WE05

You can use information timing, User, work process number , server to search job/program name in performance snap-shots generated via SAP transaction /SDF/MON…as showed in Figure 6.

Figure 6 SM21 log and /SDF/MON – Job which generated “Perform rollback” log

 

Then you can use SM37 to find specific job as showed in Figure 7.

Figure 7 SM37 Job details

Then you can check job log or job spool depends on situation from SM37. Here is the job spool

Figure 8 SAP job spool

Figure 9 Job spool details

From figure 9, you can find the idoc 0000001390670257 is in “51” status and not posted. Using WE02/05 or se16, you can check the IDOC status message, you can see that IDOC was processed again and again, each time it was issuing “perform rollback”. Based on IDOC, you can find corresponding function module which is used to load the IDOC..

Figure 10 Idoc status – technical info

Figure 11 Idoc was repeatedly processed

A great book on SAP performance – SAP Performance Optimization Guide: Analyzing and Tuning SAP Systems, SAP Basis, SAP Administration

3 SM21 – transaction canceled

Transaction can be canceled due to many reasons like program code (like “A” type message instead of “E” type), SQL error (like DBIF_RSQL_SQL_ERROR etc.), memory issue ( like System_No-Roll etc ), manual cancellation and so on.. Following are screens showing “transaction Canceled” message.

Figure 12 SM21 – transaction canceled

If you would like to know how to trace it back to know what job and program is related to a particular line of “Transaction Canceled” and there is no core-dump. Here are the steps:

SM21 (Timing, user, SAP work process, server) -> /SDF/MON ( timing, server job name, work process) -> SM37 ( verify job status )

Figure 13 SM21- Transaction canceled and its’ parent

When there is a core-dump related to SM21 log, you can use SAP transaction ST22 to find the parent of a SM21 log. With job name, timing, server name and work process number, it is possible to associate a SM12 log message to a specific job execution via SAP transaction SM37. Figure 14 is the specific job instances related to highlighted entry in Figure 13.

Figure 14 SAP SM37 JOB details

“No deliveriy items found” message showed up in the job log (Figure 15) and message type is “A”. So that is why the message went to SM21 log. So this is an application issue.

Figure 15 SM37 job cancelled under SAP message type “A”

Figure 16 shows that whenever the job was canceled, a related message was logged by the SAP system.

Figure 16 SM21 log and SM37 Job

From Figure 16, we knew that the job was executed every 20 minutes and it almost failed every time. Apparently, this was an application issue. In further analysis, I found that this was due to the same problematic IDOC. To get the message disappear from SM21 log, it is a simple fix but that would not solve the data issue as well as the program design how to solve exceptional case. From job setting point view, the frequent job cancellation can be avoided by excluding problematic IDOC from being processed each time.

In this post, I shared several cases on how to review SM21 message log and trace it back to its’ originator and solution to fix this. Sometimes, the SM21 message are logged even everything went Normal in end business view( in our case HPMONITOR ), but most of cases, users should notice something abnormal or can be monitored from application point view like SAP SM37 job cancellation. Monitoring job cancellation via SM37 and fixing job cancellation could make corresponding SM21 log disappear if there is any. So I am not suggesting that we should start with SM21 to fix an issue. The preferable starting point is that application should monitor health of their operation closely and fix any issue properly… Even there is SM12 log related to an operation, but it could be all right or nothing worth to be worry about normally.