SAP production performance issue trouble-shooting

If you as an application performance supporter is wondering where you should start to address a performance issue reported by a SAP user, then this post might help you to get started.

SAP performance issue trouble-shooting normally involves following steps.

I would like to point out that performance trouble-shooting can be an iterated process in some complex situations. There is no clear cut between those steps. How fast can you trouble-shoot a performance issue, it depends on your experience/knowledge on SAP architecture, application, database, network and your SAP application and system.

1. Gather information

When a performance issue is reported by a user, we need to gather information. The user is expected to provide following information as needed based on the performance scenario:

  • What is the performance issue/performance deviation? What is expected runtime and what is the actual runtime?
  • Which/what program/job/system/server/user is involved in the issue?
  • When does the issue start in terms of system time?
  • What is the solution architecture of the program and data flow of the program?
  • Are other users experiencing the same issue with the transaction/program if more than one user are running the transaction?
  • How was the underlying transaction/program/job executed?
  • Is there any change on the way which the program/job is executed?
  • Is the program/transaction processed more volume than normal when performance issue happens?
  • Is there any program code changes on the program and when?
  • Is there any functional configuration changes related the program and when?
  • Is this 1st time an user encounters such performance issue?
  • Is this one time issue or consistent issue? What is the pattern of the performance issue observed?
  • What is critical level of this issue from business operation point view? This understanding would not impact method of trouble-shooting.

Based on above information, you then can dig into system to get more information. For ongoing issue, you can start with SM50/SM66 to verify current status of a running transaction; based on work processes status showed in SM50/SM66, you can decide whether there is an need to check system health(resource, database and application locks etc) etc; you can use ST12 to catch one or more performance traces depends on the scenario. For performance issue happened in the past, you can start with STAD/ST03N/SM37 or /sdf/mon to get statistics data. If it is a job, you can use SAP transaction SM37 to check historical run-time information. If sm50/sm66 shows all dialog processes are busy, then you can conclude this is a system level performance issue and issue is likely related to resource; If Sm50/sm66 shows only a few sap work process are active, then you can conclude that system resource is unlikely a contributor to the issue etc. However, you might come to CPU resource again – since SM50/SM66 only shows SAP processes if further check find no useful hints or indicate CPU is still a concern. For example, I did see cases where UX level non-SAP processes are using too much CPU power which is otherwise available to SAP application and causing performance issue.

2. Verify performance issue

This is to check system data both current and history to validate the data we collects related to the issue. This is important to correct any assumption, communication issue related to the performance issue. At this step, you normally need to get historical and holistic information related to the incident from job log and performance database of the system etc.

I had experiences where users were reporting run-time issue but the runtime was actually in historical run-time range etc… Production performance incident is about performance deviation and restoring normal performance of a program/job. Also, sometimes, performance issue is due to recent changes which have been done without user’s awareness.

3. Classify performance issue and identify solution

Now we come to the point to classify a performance issue and identify the solution to the issue. Classify the issue and identify solution might involve following steps and actions:

Resource here is referring to system hardware resources (CPU, Memory, storage) and their configuration/allocation including CPU, memory, IO and sap basis configuration/setting. Local system is referring to combination of OS + DBMS + SAP basis components. Remote system/client is referring to SAP client and another system which the SAP system needs to communicate with. Business solution/SAP application here is referring to combination of SAP and customer code related to a business transaction/function. Network issue is referring to connection and data flow between different systems. Execution issue is referring to the way a business user execute a SAP transaction/function.

  • Resource issue.
    • User consumption issue – more resources are utilized by application and system and system tools
      • Application/solution itself like program design etc.
      • Identify improvement opportunity via SAP performance tools like ST12/ST05, SE30 etc, change solution design/code.
    • Resources utilization issue
      • Load distribution issue – can cause resource issue
        • Redistribute system load via tool /SDF/MON etc.
    • Resource configuration issue & Resource shortage or malfunction
      • CPU shortage issue(OS06, STAD and/or SDF/MON )-> Bring more capacity to application server.
      • Memory shortage issue(ST02, OS06, ST04 etc) -> increase physical memory.
      • Space issue(SM21, ST04 and DB02 etc) -> adjust filespace or tablespace etc.
      • IO subsystem issue.
      • Other system configuration like spool, enqueue etc -> adjust system setting in spool and enqueue etc.
  • Local system issue.
    • Table access issue.
      • Statistics issue ->Update/maintain statistics(system or sap table) via DB13, RSANAORA tool etc.
      • Index/table data fragmentation/ordering issue -> rebuild or reorder the table/index.
      • System/database parameters issue -> tune corresponding system or database parameters/setting.
    • Kernel issue/system bug.
      • Work with SAP + DBMS vendor + OS vendor to fix system issue.
  • Remote system/client issue.
    • Contact remote system/application owner to fix the issue.
  • Business solution/SAP application issue.
    • Standard SAP code/design issue.
      • Work with SAP for the solution.
    • Local developed code/design issue.
      • Identify improvement opportunity via performance tools – code inspector, st12, se30 or ST05 etc.
        • ABAP issue->fix corresponding expensive ABAP statements.
        • Database access issue -> fix corresponding expensive SQL statements.
        • Design/configuration issue -> change program design/related function configuration.
    • Data volume management issue.
      • Review data retention period and archiving solution.
  • Network issue.
    • Fix network issue.
  • Execution issue.
    • End user execution issue.
      • Educate end business user or introduce system/program level control.
    • Deploy/schedule/process issue (database locks etc)
      • Redeploy/reschedule application/jobs to different time window or adjust sequence of execution.

Which steps are needed and what step is executed first depends on the performance issue scenario and your experience. I am just sharing show a technical road map of a performance issue resolution from technical point view and not a business process. For example, if an running program is aborted due to memory issue, then you can directly check resource used by the program in ST22 and system memory utilization and configuration via ST02, from here, you can confirm whether this is a resource issue or program issue – quite straightforward, I would cover this in details in my later posting – how to trouble shot job cancellation due to memory issue.

It is important to classify whether the issue is or will be one time issue/consistent issue. One time performance issue can be due to one-time system status, abnormal application volume, network status, improper execution etc. For one time issue, there is no need to make any system/program level tuning or changes, we just need to rerun the affected application/transaction again properly. For example, when user is running a large one-time load in business busy hours – this can cause system performance issue with symptom like CPU shortage, process shortage or/and memory shortage etc, but this is not a “true” system resource issue if the performance issue can be avoided via more strict control on the execution such as time-window and/or parallelism degrees.

3. Solution implementation and verification

This is to make corresponding change on relevant technical component and/or the way the application is executed and verify the performance concern is gone via needed performance tools like STAD, /SDF/MON etc and/or user confirmation.

I am planning to write more posts in this area –like how I can know performance issue is due to program/design, how I can identify performance bottleneck of a program and how I can know system is not a root cause of performance issue etc.

2 thoughts on “SAP production performance issue trouble-shooting”

Leave a Reply