Performance Testing Traps / Testing Pitfalls

The results from last performance test outperformed even optimistic expectations?
Before publishing those test results we recommend to check, if you did run into one of the following test traps (pitfalls):

Trap Potential differences comparing tests (*) Potential differences comparing Tests with Production
Trap 1: Unrealistic Cache Hits yes yes
Trap 2: Sorted Test Data no yes
Trap 3: No-Archivelog-Mode no yes
Trap 4: Triggers / Snapshot Logs for Replication to EDW / DWH / Reporting Server deactivated no yes
Trap 5: Varying Storage Performance yes yes

(*) Comparing tests on same test environment with same test data


Trap 1: Unrealistic Cache Hits

in

  • Database Cache
  • File System Cache
  • SAN – Cache

when repeating test runs with same test data but different testing parameters - a typical scenario during extensive performance-, load- and stress testing.

How to identify

Cache hits in the file system cache and in the SAN-cache can be identified by the disk latency reported in the database performance statistics.
Disk latency of less then 5 milliseconds for reading one 8 KB or 16 KB database block are “too fast” for a real spindle-hit.
5 milliseconds latency are excellent values, 6-8 milliseconds are good.
For the case that you do not expect such (high) cache hits in file system cache and SAN cache in realistic production scenario, this fact needs to be reported.
Back to top

How to avoid

Database Cache: Before each tests flush the database cache; if your database does not provide such a command, you need to restart the database.
File System Cache: Reboot the server.
SAN-Cache: for the case that you do not expect cache hits on the SAN storage in real production cycle, you need to include in your performance test results not only the measured raw data but also an estimation of slower but realistic results considering real production scenarios. You might also discuss with your SAN-vendor this topic, e.g. if caching can be reduced during your test.


Trap 2: “sorted” test data

Pre-requisite for performance tests is the creation of sufficient volume of test data – in optimal case the full volume of customers.

For that purpose scripts are used to create artificial customers, e.g.
customer_id = 1, name = “Customer1”
customer_id = 2, name = “Customer2”.

Assuming an average record length of 200 bytes, a 8 KB database block will contain data of about 40 customers.

If in real live those customers create randomly transactions, and due to the large size of your database table(s) containing customer data only a small percentage of those table(s) is in the database cache, then this will result in physical reads; and in most cases only 1 of the 40 customers in the database block read from disk will be accessed.

However, if your transaction test data for your performance test contain the transactions ordered by customer id, then during the test all 40 customer records will be accessed in the short time this database block remains in the database cache.

How to avoid

Create test data with sufficient randomness, and ensure the same randomness during processing those test data.
Back to top


Trap 3: “No-archivelog Mode”

Performance Tests may show too good results, faster than the performance achievable in real production environment.

  1. Less disk IO, because redolog files are not copied to archivelog destination after each log switch.
  2. Less CPU-consumption, as the "archiver" process copying redolog files to archivelog destination does not start.
  3. Operations like "create index", "alter index ... rebuild", "create table as select" do not create redolog, when database operates in noarchivelog mode. In this case even the log writer processes less data.

This trap has high impact on environments which are already suffering from slow disk reads and slow disk writes (IO bound).

How to avoid

  • Operate the database during performance tests in archivelog mode
    • Note: To avoid that the file system for archivelog files runs full, some testers just write scripts to delete (rm) those files; this often used practice is not optimal, as the additional disk IO load caused by reading those files again to backup those to tape is missing.

      A more detailed evaluation of this topic is available in
      Transaction Log / Archive Log Volume
      – An "Awareness Paper" for Application Support / Application Operations, Operations Managers and Change Managers
      .

      Back to top


      Trap 4: Triggers / Snapshot Logs for Replication to EDW / DWH / Reporting Server deactivated

      Background

      Recording data changes using triggers, such that other systems can easily replicate just those changed data, is a very common practice.
      Those change-tracking-records (e.g. "Snapshot Logs") stored in database tables are automatically deleted after the changed data have been replicated to the destination system(s).

      However, those destination systems won't replicate data from a testing system.

      For this reason the recording of those data changes is just stopped, as otherwise those database tables recording the changes would grow forever.

      Impact and Testing Trap: In this case, each DML (insert, update, delete) is faster, consumes less CPU and creates less transaction log / redo log volume.

        Note 1:

      If the too low redo log / archive log / transaction log volume is used for

      • sizing the file system for those archive log / transaction log files
      • sizing required performance (MB/second) for those files
      • sizing required backup performance for archive log / transaction log files

      then those too low values will result in wrong sizing - and could, in worst case - cause production incidents because of archive log destination / transaction log filesystem running full.

      Example: Updating a database column from "1" to "0" or "null" will result in only a few bytes transaction log / redo log, but would create a much longer "insert" in the change-tracking table. For this reason the increase of transaction log / redo log volume can be significant; it depends on the type of database activity and is very application specific.

        Note 2:

      The database writer process has also less work to process, and it may happen that this process (or the IO-system) becomes on the real production system the bottleneck, due to (much) higher database write activity.

      How to avoid

      • Do NOT disable those triggers recording data changes.
      • Create a script which just deletes the chang-tracking records (e.g. snapshot logs).
      • Schedule that script with same frequency as the destination systems would refresh the data (and delete the change tracking data.

      Back to top

      Trap 5: Varying (SAN, NAS) Storage Performance

      A dedicated server for performance testing to avoid any impact from other testing activities, is common practice.
      No CPU or Memory is consumed by anything else than by performance testing.

      In case that local disks are used, the performance of the storage (disk IO performance) should be constant too.

      However, if a shared storage, e.g. a SAN (Storage Area Network) or NAS (Network attached Storage) is used, the IO-performance might vary between performance tests.

      Even QoS (Quality of Service) cannot eliminate that completely.
      E.g. defining a target of 8 milliseconds per random read, would ensure that the latency won't increase beyond 8 milliseconds.
      However, some of the test runs might get still better values.

      How to deal with this uncertainty?

      Measure and document the disk latency during each test.
      In case that the values are quite constant during all tests, they are easily comparable.
      In case that the results vary over the tests, the impact of the varying IO-performance needs to be determined and considered when comparing the test results.

      Analytical Approach:

      Detailed Statistics show, that during a test run with IO-read-latency of 6 ms within 1 hour

      30 minutes - CPU consumption by application process
       8 minutes - CPU consumption on Database
      20 minutes - disk waits for random read
       2 minutes - log writer waiting for disk
      ----------------------------------------------------
      60 minutes total elapsed time
      

      in case that disk read latency increases from 6 ms/read to 12 ms/read, the total time for disk reads will increase from 20 minutes to 40 minutes, the total time from 60 minutes to 80 minutes. (Assumption: No other parameter was changed, the number of disk reads is constant, and the change in disk read time just proportional to the change in disk latency.)
      In case that in some tests the number of disk reads is lower, this could be the result of Trap Nr. 1 - "Unrealistic Cache hits" in database or application cache.

      Documenting the time slices for each test run as well as details of number of disk reads / writes and disk latency should show a clear correlation to the varying disk latency.

      This detailed documentation of the impact of varying disk latency is an additional benefit for the future production use, as varying disk performance is also observed on most production systems.

      Conducting an extra series of tests

      Run several performance tests with same test case and same parameters, except varying IO-performance.
      Obstacle: You might not have any chance to influence the IO-performance, and if it's suddenly stable you won't get the expected results.
      Often storage-team monitors and documents SAN-performance over time. In case of good communication with Storage team you might get information when backups are run, or SAN-internal activities (synchronizing / establishing a 3rd mirror) are taking place, or other servers connected to the SAN run IO-intensive jobs impacting overall SAN-performance. This will help you to schedule performance tests during those periods.

      Otherwise, you might have already detected a pattern of varying IO performance over time (and probably already tracked back to e.g. concurrent backups ...) such that you can schedule test runs at that time.

      Summary to Trap 5

      The understanding and documentation of the impact of varying disk performance during performance testing is a valuable information when varying performance during production stage needs to be investigated and explained.

      In optimal case a mathematical model to determine the impact of varying storage performance has been developed during performance tests.

      Back to top