Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
dynaTrace software = The Lifecycle APM Solution: Monitor, Diagnose & Prevent
Get JProbe Freeware - For Java Profiling And Memory Analysis - Today.
Tidal Intersperse: Application Mapping, SLA Monitoring, & More for Java Apps
Java performance tuning tips
|
See Your Message Here
|
|
You could have your tool advertised here, to be seen by thousands of potential customers
|
|
dynaTrace Software
|
|
dynaTrace software = The Lifecycle APM Solution: Monitor, Diagnose & Prevent
|
|
Quest Software
|
|
Get JProbe Freeware - For Java Profiling And Memory Analysis - Today.
|
|
Tidal Software
|
|
Tidal Intersperse: Application Mapping, SLA Monitoring, & More for Java Apps
|
|
|
Note that this page is very large. The tips on this page are categorized
in other pages. Use
the tips index page to access smaller focused listings of tips.
This page lists many other pages available on the web, together with a condensed
list of tuning tips that each page includes. For the most part I've eliminated
any tips that are wrong, but one or two may have slipped past me. Remember that
the tuning tips listed are not necessarily good coding practice. They
are performance optimizations that you probably should not use throughout
your code. Instead they apply to speeding up critical sections of code
where performance has already been identified as a problem.
The tips here include only those that are available online for free. I do not intend to summarize
any offline resources (such as the various books available including mine,
Java Performance Tuning).
The tips here are of very variable quality and usefulness, some real gems but
some dross and quite a bit of repetition. Comments in square brackets, [], have
been added by me.
Use this page by using your browser's "find" or "search" option to identify
particular tips you are interested in on the page, and follow up by reading
the referenced web page if clarification is necessary.
This page is currently 411KB. This page is updated once a month. You can receive
email notification of any changes by subscribing to the
newsletter
http://www.onjava.com/pub/a/onjava/2001/02/22/optimization.html
Performance planning for managers (Page last updated February 2001, Added 2001-03-21, Author Jack Shirazi, Publisher OnJava). Tips:
- Include budget for performance management.
- Create internal performance experts.
- Set performance requirements in the specifications.
- Include a performance focus in the analysis.
- Require performance predictions from the design.
- Create a performance test environment.
- Test a simulation or skeleton system for validation.
- Integrate performance logging into the application layer boundaries.
- Performance test the system at multiple scales and tune using the resulting information
- Deploy the system with performance logging features.
ftp://ftp.ora.com/pub/examples/java/javapt/technique-list.html
A long list of most of the tuning techniques covered in my "Java Performance Tuning" book (Page last updated August 2000, Added 2000-10-23, Author Jack Shirazi, Publisher O'Reilly). Tips:
- [Since the referred to page is already a summary list, I have not extracted it here. Especially since there are nearly 300 techniques listed. Check the page out directly].
http://www.onjava.com/pub/a/onjava/2001/05/30/optimization.html
Comparing the performance of LinkedLists and ArrayLists (and Vectors) (Page last updated May 2001, Added 2001-06-18, Author Jack Shirazi, Publisher OnJava). Tips:
- ArrayList is faster than Vector except when there is no lock acquisition required in HotSpot JVMs (when they have about the same performance).
- Vector and ArrayList implementations have excellent performance for indexed access and update of elements, since there is no overhead beyond range checking.
- Adding elements to, or deleting elements from the end of a Vector or ArrayList also gives excellent performance except when the capacity is exhausted and the internal array has to be expanded.
- Inserting and deleting elements to Vectors and ArrayLists always require an array copy (two copies when the internal array must be grown first). The number of elements to be copied is proportional to [size-index], i.e. to the distance between the insertion/deletion index and the last index in the collection. The array copying overhead grows significantly as the size of the collection increases, because the number of elements that need to be copied with each insertion increases.
- For insertions to Vectors and ArrayLists, inserting to the front of the collection (index 0) gives the worst performance, inserting at the end of the collection (after the last element) gives the best performance.
- LinkedLists have a performance overhead for indexed access and update of elements, since access to any index requires you to traverse multiple nodes.
- LinkedList insertions/deletion overhead is dependent on the how far away the insertion/deletion index is from the closer end of the collection.
- Synchronized wrappers (obtained from Collections.synchronizedList(List)) add a level of indirection which can have a high performance cost.
- Only List and Map have efficient thread-safe implementations: the Vector and Hashtable classes respectively.
- List insertion speed is critically dependent on the size of the collection and the position where the element is to be inserted.
- For small collections ArrayList and LinkedList are close in performance, though ArrayList is generally the faster of the two. Precise speed comparisons depend on the JVM and the index where the object is being added.
- Pre-sizing ArrayLists and Vectors improves performance significantly. LinkedLists cannot be pre-sized.
- ArrayLists can generate far fewer objects for the garbage collector to reclaim, compared to LinkedLists.
- For medium to large sized Lists, the location where elements are to inserted is critical to the performance of the list. ArrayLists have the edge for random access.
- A dedicated List implementation designed to match data, collection types and data manipulation algorithms will always provide the best performance.
- ArrayList internal node traversal from the start to the end of the collection is significantly faster than LinkedList traversal. Consequently queries implemented in the class can be faster.
- Iterator traversal of all elements is faster for ArrayList compared to Linkedlist.
http://www.onjava.com/pub/a/onjava/2001/07/09/optimization.html
Using the WeakHashMap class (Page last updated June 2001, Added 2001-07-20, Author Jack Shirazi, Publisher OnJava). Tips:
- WeakHashMap can be used to reduce memory leaks. Keys that are no longer strongly referenced from the application will automatically make the corresponding value reclaimable.
- To use WeakHashMap as a cache, the keys that evaluate as equal must be recreatable.
- Using WeakHashMap as a cache gives you less control over when cache elements are removed compared with other cache types.
- Clearing elements of a WeakHashMap is a two stage process: first the key is reclaimed, then the corresponding value is released from the WeakHashMap.
- String literals and other objects like Class which are held directly by the JVM are not useful as keys to a WeakHashMap, as they are not necessarily reclaimable when the application no longer references them.
- The WeakHashMap values are not released until the WeakHashMap is altered in some way. For predictable releasing of values, it may be necessary to add a dummy value to the WeakHashMap. If you do not call any mutator methods after populating the WeakHashMap, the values and internal WeakReference objects will never be dereferenced [no longer true from 1.4, where most methods now allow values to be released].
- WeakHashMap wraps an internal HashMap adding an extra level of indirection which can be a significant performance overhead. [no longer true from 1.4].
- Every call to get() creates a new WeakReference object. [no longer true from 1.4].
- WeakHashMap.size() iterates through the keys, making it an operation that takes time proportional to the size of the WeakHashMap. [no longer true from 1.4].
- WeakHashMap.isEmpty() iterates through the collection looking for a non-null key, so a WeakHashMap which is empty requires more time for isEmpty() to return than a similar WeakHashMap which is not empty. [no longer true from 1.4, where isEmpty() is now slower than previous versions].
http://java.oreilly.com/news/jptsummary_1100.html
ftp://ftp.ora.com/pub/examples/java/javapt/summary.html
A high level overview of technical performance tuning, covering 5 levels of tuning competence. (Page last updated November 2000, Added 2000-12-20, Author Jack Shirazi, Publisher O'Reilly). Tips:
- Start tuning by examining the application architecture for potential bottlenecks.
- Architecture bottlenecks are often easy to spot: they are the connecting lines on the diagrams; the single threaded components; the components with many connecting lines attached; etc.
- Ensure that application performance is measureable for the given performance targets.
- Ensure that there is a test environment which represents the running system. This test-bed should support testing the application at different loads, including a low load and a fully scaled load representing maximum expected usage.
- After targeting design and architecture, the biggest bang for your buck in terms of improving performance is choosing a better VM, and then choosing a better compiler.
- Start code tuning with proof of concept bottleneck removal: this consists of using profilers to identify bottlenecks, then making simplified changes which may only improve the performance at the bottleneck for a specialized set of activities, and proceeding to the next bottleneck. After tuning competence is gained, move to full tuning.
- Each multi-user performance test can typically take a full day to run and analyse. Even simple multi-user performance tuning can take several weeks.
- After the easily idenitified bottlenecks have been removed, the remaining performance improvements often come mainly from targeting loops, structures and algorithms.
- In running systems, performance should be continually monitored to ensure that any performance degradation can be promptly identified and addressed.
http://www.oreilly.com/catalog/javapt/chapter/ch04.html
Chapter 4 of "Java Performance Tuning", "Object Creation". (Page last updated September 2000, Added 2000-10-23, Author Jack Shirazi, Publisher O'Reilly). Tips:
- Establish whether you have a memory problem.
- Reduce the number of temporary objects being used, especially in loops.
- Avoid creating temporary objects within frequently called methods.
- Presize collection objects.
- Reuse objects where possible.
- Empty collection objects before reusing them. (Do not shrink them unless they are very large.)
- Use custom conversion methods for converting between data types (especially strings and streams) to reduce the number of temporary objects.
- Define methods that accept reusable objects to be filled in with data, rather than methods that return objects holding that data. (Or you can return immutable objects.)
- Canonicalize objects wherever possible. Compare canonicalized objects by identity. [Canonicalizing objects means having only a single reference of an object, with no copies possible].
- Create only the number of objects a class logically needs (if that is a small number of objects).
- Replace strings and other objects with integer constants. Compare these integers by identity.
- Use primitive data types instead of objects as instance variables.
- Avoid creating an object that is only for accessing a method.
- Flatten objects to reduce the number of nested objects.
- Preallocate storage for large collections of objects by mapping the instance variables into multiple arrays.
- Use
StringBuffer rather than the string concatenation operator (+).
- Use methods that alter objects directly without making copies.
- Create or use specific classes that handle primitive data types rather than wrapping the primitive data types.
- Consider using a
ThreadLocal to provide threaded access to singletons with state.
- Use the
final modifier on instance-variable definitions to create immutable internally accessible objects.
- Use
WeakReferences to hold elements in large canonical lookup tables. (Use SoftReferences for cache elements.)
- Reduce object-creation bottlenecks by targeting the object-creation process.
- Keep constructors simple and inheritance hierarchies shallow.
- Avoid initializing instance variables more than once.
- Use the
clone() method to avoid calling any constructors.
- Clone arrays if that makes their creation faster.
- Create copies of simple arrays faster by initializing them; create copies of complex arrays faster by cloning them.
- Eliminate object-creation bottlenecks by moving object creation to an alternative time.
- Create objects early, when there is spare time in the application, and hold those objects until required.
- Use lazy initialization when there are objects or variables that may never be used, or when you need to distribute the load of creating objects.
- Use lazy initialization only when there is a defined merit in the design, or when identifying a bottleneck which is alleviated using lazy initialization.
http://java.oreilly.com/news/javaperf_0900.html
My article on basic optimizations for queries on collections (Page last updated September 2000, Added 2000-10-23, Author Jack Shirazi, Publisher O'Reilly). Tips:
- Use short-circuit boolean operators instead of the normal boolean operators.
- Eliminate any unnecessarily repeated method calls from loops.
- Eliminate unnecessary casts.
- Avoid synchronization where possible.
- Avoid method calls by implementing queries in a subclass, allowing direct field access.
- Use temporary local variables to manipulate data fields (instance/class variables).
- Use more precise object typing where possible.
- Before manual tuning, HotSpot VMs are often faster than JIT VMs. But JIT VMs tend to benefit more from manual tuning and can end up faster than HotSpot VMs.
http://www.javaworld.com/javaworld/jw-11-2000/jw-1117-optimize.html
Article about optimizing queries on Maps. (Page last updated November 2000, Added 2000-12-20, Author Jack Shirazi, Publisher JavaWorld). Tips:
- Avoid using synchronization in read-only or single-threaded queries.
- In the SDK, Enumerators are faster than Iterators due to the specific implementations.
- Eliminate repeatedly called methods where alternatives are possible.
- Iterator.hasNext() and Enumerator.hasMoreElements() do not need to be repeatedly called when the size of the collection is known. Use collection.size() and a loop counter instead.
- Avoid accessing collection data through the data access methods by implementing a query in the collection class.
- Elminate repeated casts by casting once and holding the cast item in a correctly typed variable.
- Reimplement the collection class to specialize for the data being held in the collection.
- Reimplment the Map class to use a hash function which is more efficient for the data being mapped.
http://www.onjava.com/pub/a/onjava/2001/01/25/hash_functions.html
Optimizing hash functions: generating a perfect hash function (Page last updated January 2001, Added 2001-02-21, Author Jack Shirazi, Publisher OnJava). Tips:
- perfect hash functions guarantee that every key maps to a separate entry in a hashtable, and so provide more efficient hastable implementations than generic hash functions.
- perfect hash functions are possible when the key data is restricted to a known set of elements.
- Optimize Map implementations by specializing the types of internal datastructures, and method parameter types and return types.
- Optimize Map implementations by using a specialized hash function that is optimized for the key type, rather than generic to all possible types of keys.
- Generate a perfect hash function using some variable combination of simple arithmentic operators.
- Perfect hash functions may require excessive amounts of memory.
- Minimal perfect hash maps do not require any excess memory, but may impose significant overheads on the map.
http://www.onjava.com/pub/a/onjava/2002/03/20/optimization.html
Microtuning (Page last updated March 2002, Added 2002-03-25, Author Jack Shirazi, Publisher OnJava). Tips:
- Performance is dependent on data as well as code. Different data can make identical code perform very differently.
- Always start tuning with a baseline measurement.
- The System.currentTimeMillis() method is the most basic measuring tool for tuning.
- You may need to repeatedly call a method in order to reliably measure its average execution time.
- Minimize the possibility that CPU time will be allocated to anything other than the test while it is running by ensuring no other processes are runing during the test, and that the test remains in the foreground.
- Baseline measurements normally show some useful information, e.g. the average execution time for one call to a method.
- Multiplying the average time taken to execute a method or sequence of methods, by the number of times that sequence will be called in a time period, gives you an estimate of the fraction of the total time that the sequence takes.
- There are three routes to tuning a method: Consider unexpected differences in different test runs; Analyze the algorithm; Profile the method.
- Creating an exception is a costly procedure, because of filling in stack trace.
- A profiler should ideally be able to take a snapshot of performance between two arbitrary points.
- Tuning is an iterative process: you normally find one bottleneck, make changes that improve performance, test those changes, and then start again.
- Algorithm changes usually provide the best speedup, but can be difficult to find.
- Examining the code for the causes of the differences in speed between two variations of test runs can be useful, but is restricted to those tests for which you can devise alternatives that show significant timing variations.
- Profiling is always an option and almost always provides something that can be speeded up. But the law of diminishing returns kicks in after a while, leaving you with bottlenecks that are not worth speeding up, because the potential speedup is too small for the effort required.
- Generic integer parsing (as with the Integer constructors and methods) may be overkill for converting simple integer formats.
- Simple static methods are probably best left to be inlined by the JIT compiler rather than by hand.
- String.equals() is expensive if you are only testing for an empty string. It is quicker to test if the length of the string is 0.
- Set a target speedup to reach. With no target, tuning can carry on for much longer than is needed.
- A generic tuning procedure is: Identify the bottleneck; Set a performance target; Use representative data; Measure the baseline; Analyze the method; Test the change; Repeat.
http://www.onjava.com/pub/a/onjava/2000/12/15/formatting_doubles.html
Efficiently formatting doubles (Page last updated December 2000, Added 2000-12-20, Author Jack Shirazi, Publisher OnJava). Tips:
- Double.toString(double) is slow. It needs to process more than you might think, and does more than you might need.
- Proprietary conversion algorithms can be significantly faster. One such algorithm is presented in the article.
- Converting integers to strings can also be faster than the SDK. An algorithm successively stripping off the highest is used in the article.
- Formatting numbers using java.text.DecimalFormat is always slower than Double.toString(double), because it first calls Double.toString(double) then parses and converts the result.
- Formatting using a proprietary conversion algorithm can be faster than any of the methods discussed so far, if the number of digits being printed is not large. The actual time taken depends on the number of digits being printed.
http://www.onjava.com/pub/a/onjava/2001/09/25/optimization.html
Multiprocess JVMs (Page last updated September 2001, Added 2001-10-22, Author Jack Shirazi, Publisher OnJava). Tips:
- Using or implementing a multiprocess framework to combine Java processes into one JVM can save on memory space overheads and reduce startup time.
http://www.onjava.com/pub/a/onjava/2001/12/05/optimization.html
Measuring JDBC performance (Page last updated December 2001, Added 2001-12-26, Author Jack Shirazi, Publisher OnJava). Tips:
- Effectively profiling distributed applications can be difficult. I/O can show up as significant in profiling, simply because of the nature of a distributed application.
- It can be unclear whether threads blocking on reads and writes are part of a significant bottleneck or simply a side issue.
- When profiling, it is usually worthwhile to have separate measurements available for the communication subsystems.
- Wrapping the JDBC classes provides an effective technique for measuring database calls.
- [Article discusses how to create JDBC wrapers to measure the performance of database calls].
- If more than a few rows of a query are being read, then the ResultSet.next() method can spend a significant amount of time fetching rows from the database, and this time should be included in measurements of database access.
- JDBC wrappers are simple and robust, and require very little alteration to the application using them (i.e, are low maintenance), so they are suitable to be retained within a deployed application.
http://www.onjava.com/pub/a/onjava/2001/08/22/optimization.html
Catching OutOfMemoryErrors (Page last updated August 2001, Added 2001-10-22, Author Jack Shirazi, Publisher OnJava). Tips:
- -Xmx and -Xms (-mx and -ms) specify the heap max and starting sizes. Runtime.totalMemory() gives the current process size, Runtime.maxMemory() (available from SDK 1.4) gives the -Xmx value.
- Repeatedly allocating memory by creating objects and holding onto them will expand the process to its maximum possible size. This technique can also be used to flush memory.
- If a process gets too large, the operating system will start paging the process causing a severe decrease in performance.
- It is reasonable to catch the OutOfMemoryError if you can restore your application to a known state that can proceed with processing. For example, daemon service threads can often do this.
http://www.onjava.com/pub/a/onjava/2001/10/23/optimization.html
The RandomAccess interface. (Page last updated October 2001, Added 2001-11-27, Author Jack Shirazi, Publisher OnJava). Tips:
- A java.util.List object which implements RandomAccess should be faster when using List.get() than when using Iterator.next().
- Use
instanceof RandomAccess to test whether to use List.get() or Iterator.next() to traverse a List object.
- [Article describes how to guard the test to support all versions of Java].
http://www.cs.berkeley.edu/~mdw/proj/java-nbio/
Whoopee!! A non-blocking I/O library for Java. This is the single most important functionality missing from the SDK for scalable server applications. The important class is SelectSet which allows you to multiplex all your i/o streams. If you want a scalable server and can use this class then DO SO. NOTE THAT SDK 1.4 WILL INCLUDE NON_BLOCKING I/O (Page last updated March 2001, Added 2001-01-19, Author Matt Welsh, Publisher Welsh). Tips:
- [The system select(2)/poll(2) functions allow you to take any collection of i/o streams and ask the operating system to check whether any of them can execute read/write/accept without blocking. The system call will block if requested until any one of the i/o streams is ready to execute. Before Java, no self-respecting server would sit on multiple threads in blocked i/o mode, wasting thread resources: instead select/poll would have been used.]
http://www.cs.cmu.edu/~jch/java/optimization.html
For years, Jonathan Hardwick's old but classic site was the only coherent Java performance tuning site on the web. He built it while doing his PhD. It wasn't updated beyond March 1998, when he moved to Microsoft, but most tips are still useful and valid. The URL is for the top page, there are another eight pages. Thanks Jonathan. (Page last updated March 1998, Added 2000-10-23, Author Jonathan Hardwick, Publisher Hardwick). Tips:
- Don't optimize as you go. Write your program concentrating on clean, correct, and understandable code.
- Use profiling to find out where that 80% of execution time is going, so you know where to concentrate your effort.
- Always run "before" and "after" benchmarks.
- Use the right algorithms and data structures.
- Compile with optimization flag, javac -O.
- Use a JIT.
- Multithread for multi-processor machines.
- Use clipping to reduce the amount of work done in repaint()
- Use double buffering to improve perceived speed.
- Use image strips or compression to speed up downloading times.
- Animation in Java Applets from JavaWorld and Performing Animation from Sun are two good tutorials.
- Use high-level primitives; it's much faster to call drawPolygon() on a bunch of points than looping with drawLine().
- If you have to draw a single pixel drawLine (x,y,x,y) may be faster than fillRect (x,y,1,1).
- Use Buffered I/O classes.
- Avoid synchronized methods if you can.
- Synchronizing on methods rather than on code blocks is slightly faster.
- Use exceptions only where you really need them.
- Use StringBuffer instead of +.
- Use System.arraycopy() and any other optimized API's available from the SDK.
- Replace the generic standard classes with faster implementations specific to the application.
- Create subclasses to override methods with faster versions.
- Avoid expensive constructs and data structures, e.g. one-dimensional array is faster than a two-dimensional array.
- Use the faster switch bytecode.
- Use private and static methods, and final classes, to encourage inlining by the compiler.
- Reuse objects.
- Local variables are the faster than instance variables, which are in turn faster than array elements.
- ints are the fastest data type.
- Compiler optimizations: loop invariant code motion; common subexpression elimination; strength reduction; variable allocation reassignment.
- Use java -prof or other profiler.
- Use a timing harness to run benchmarks.
- Use a memory measurement harness to run benchmarks.
- Call system.gc() before every timing run to minimize inconsistent results due to garbage collection in the middle of a run.
- Use JAR or zip files.
- If size is a constraint: use SDK classes wherever possible; inherit whatever possible; put common code in one place; initialize big arrays at runtime by parsing a string; use short names;
http://www.ddjembedded.com/resources/articles/2001/0112g/0112g.htm
Balancing Network Load with Priority Queues (Page last updated December 2001, Added 2002-02-22, Author Frank Fabian, Publisher Dr. Dobb's). Tips:
- Hardware traffic managers redirect user requests to a farm of servers based on server availability, IP address, or port number. All traffic is routed to the load balancer, then requests are fanned out to servers based on the balancing algorithm.
- Popular load-balancing algorithms include: server availability (find a server with available processing capability); IP address management (route to the nearest server by IP address); port number (locate different types of servers on different machines, and route by port number); HTTP header checking (route by URI or cookie, etc).
- Web hits should cater for handling peak hit rate, not the average rate.
- You can model hit rates using gaussian distribution to determine the average hit rate per time unit (e.g. per second) at peak usage, then a poisson probability gives the probability of a given number of users simulatneously hitting the server within that time unit. [Article gives an example with gaussian fitted to peak traffic of 4000 users with a standard deviation of 20 minutes resulting in an average of 1.33 users per second at the peak, which in turn gives the probabilities that 0, 1, 2, 3, 4, 5, 6 users hitting the server within one second as 26%, 35%, 23%, 10%, 3%, 1%, 0.2%. Service time was 53 milliseconds, which means that the server can service 19 hits per second without the service rate requiring requests being queued.]
- System throughput is the arrival rate divided by the service rate. If the ratio becomes greater than one, requests exceed the system capability and will be lost or need to be queued.
- If requests are queued because capacity is exceeded, the throughput must drop sufficiently to handle the queued requests or the system will fail (the service rate must increase or arrival rate decrease). If the average throughput exceeds 1, then the system will fail.
- Sort incoming requests into different priority queues, and service the requests according to the priorities assigned to each queue. [Article gives the example where combining user and automatic requests in one queue can result in a worst case user wait of 3.5 minutes, as opposed to less than 0.1 seconds if priority queues are used].
- [Note that Java application servers often do not show a constant service time. Instead the service time often increases with higher concurrency due to non-linear effects of garbage collection].
http://library.cs.tuiasi.ro/programming/java/cutting_edge_java_game_programming/ewtoc.html
"Cutting Edge Java Game Programming". Oldish but still useful intro book to games programming using Java. (Page last updated 1996, Added 2001-06-18, Author Neil Bartlett, Steve Simkin , Publisher Coriolis). Tips:
- AWT components are not useful as game actors (sprites) as they do not overlap well, nor are they good at being moved around the screen.
- Celled image files efficiently store an animated image by dividing an image into a rectangular grid of cells, and allocating a different animation image to each cell. A sequence of similar images (as you would have for an animation) will be stored and transferred efficiently in most image formats.
- Examining pixels using PixelGrabber is slow.
- drawImage() can throw away and re-load images in response to memory requirements, which can make things slow.
- Pre-load and pre-scale images before using them to get a smoother and faster display.
- The more actors (sprites), the more time it takes to draw and the slower the game appears.
- Use double-buffering to move actors (sprites), by redrawing the actor and background for the relevant area.
- Redraw speed depends on: how quickly each object is drawn; how many objects are drawn; how much of each object is drawn; the total number of drawing operations. You need to reduce some or all of these until you get to about 30 redraws per second.
- Don't draw actors or images that cannot be seen.
- If an actor is not moving then incorporate the actor as part of the background.
- Only redraw the area that has changed, e.g. the old area where an actor was, and the new area where it is. Redrawing several small areas is frequently faster than drawing one large area. For the redraws, eliminate overlapping areas and merge adjacent (close) areas so that the number of redraws is kept to a minimum.
- Put slow and fast drawing requirements in separate threads.
- Bounding-box detection can use circles for the bounding box which requires a simple radii detection.
- Load sounds in a background thread.
- Make sure you have a throttle control that can make the game run slower (or pause) when necessary.
- The optimal network topology for network games depends on the number of users.
- If the cumulative downloading of your applet exceeds the player?s patience, you?ve lost a customer.
- The user interface should always be responsive. A non-responsive window means you will lose your players. Give feedback on necessary delays. Provide distractions when unavoidable delays will be lengthy [more than a few seconds].
- Transmission time varies, and is always slow compared to operations on the local hardware. You may need to decide the outcome of the action locally, then broadcast the result of the action. This may require some synchronization resolution.
- Latency between networked players can easily lead to de-synchronized action and player frustration. Displays should locally simulate remote action as continuing current activities/motions, until the display is updated. On update, the actual current situation should be smoothly resolved with the simulated current situation.
- Sending activity updates more frequently ensures smoother play and better synchronization between networked players, but requires more CPU effort and so affects the local display. In order to avoid adversely affecting local displays, send actvity updates from a low priority thread.
- Discard any out-of-date updates: always use the latest dated update.
- A minimum broadcast delay of one-third the average network connection travel time is appropriate. Once you exceed this limit, the additional traffic can cause more grief than benefit.
- Put class files into a (compressed) container for network downloading.
- Avoid repeatedly evaluating invariant expressions in a loop.
- Take advantage of inlining where possible (using final, private and static keywords, and compiling with javac -O)
- Profile the code to determine the expensive methods (e.g. using the -prof option)
- Use a dissassembler (e.g. like javap) to determine which of various alternative coding formulations produces smaller bytecode.
- To reduce the number of class files and their sizes: use the SDK classes as much as possible; and implement common functionality in one place only.
- To optimize speed: avoid synchronized methods; use buffered I/O; reuse objects; avoid unnecessary screen painting.
- Raycasting is faster than raytracing. Raycasting maps 2D data into a 3D world, drawing entire vertical lines using one ray. Use precalculated values for trignometric and other functions, based on the angle increments chosen for your raycasting.
- In the absence of a JIT, the polygon drawing routines fron the AWT are relatively efficient (compared to array manipulation) and may be faster than texture mapping.
- Without texture mapping, walls can be drawn faster with one call to fillPolygon (rather than line by line).
- An exponential jump search algorithm can be used to reduce ray casts - by quickly finding boundaries where walls end (like a binary search, but double increments until your overshoot, then halving increments from the last valid wall position).
- It is usually possible to increase performance at the expense of image quality and accuracy. Techniques include reducing pixel depth or display resolution, field interlacing, aliasing. The key, however, is to degrade the image in a way that is likely to be undetectable or unnoticeable to the user. For example a moving player often pays less attention to image quality than a resting or static player.
- Use information gathered during the rendering of one frame to approximate the geometry of the next frame, speeding up its rendering.
- If the geometry and content is not too complicated, binary space partition trees map the view according to what the player can see, and can be faster than ray casting.
http://www.javaworld.com/javaworld/jw-03-2001/jw-0323-performance.html
Designing remote interfaces (Page last updated March 2001, Added 2001-04-20, Author Brian Goetz, Publisher JavaWorld). Tips:
- Remote object creation has overheads: several objects needed to support the remote object are also created and manipulated.
- Remote method invocations involve a network round-trip and marshalling and unmarshaling of parameters. This adds together to impose a significant latency on remote method invocations.
- Different object parameters can have very different marshalling and unmarshaling costs.
- A poorly designed remote interface can kill a program's performance.
- Excessive remote invocation network round-trips are a huge performance problem.
- Calling a remote method that returns multiple values contained in a temporary object (such as a Point), rather than making multiple consecutive method calls to retrieve them individually, is likely to be more efficient. (Note that this is exactly the opposite of the advice offered for good performance of local objects.)
- Avoid unnecessary round-trips: retrieve several related items simultaneously in one remote invocation, if possible.
- Avoid returning remote objects when the caller may not need to hold a reference to the remote object.
- Avoid passing complex objects to remote methods when the remote object doesn't necessarily need to have a copy of the object.
- If a common high-level operation requires many consecutive remote method calls, you need to revisit the class's interface.
- A naively designed remote interface can lead to an application that has serious scalability and performance problems.
- [Article gives examples showing the effect of applying the listed advice].
http://www.glenmccl.com/jperf/
Glen McCluskey's paper with 30 tuning tips, now free. (Page last updated October 1999, Added 2000-10-23, Author Glen McCluskey, Publisher McCluskey). Tips:
- Faster algorithms are better.
- Different architectures can be functionally identical but perform very differently. Keep performance in mind at the design stage.
- Use the fastest available JVM.
- Use static variables for fields that only need to be assigned once.
- Reuse objects where reasonable, e.g. nodes of a linked list.
- Inline methods manually where appropriate. [Better to use a preprocessor].
- Keep methods short and simple to make them automatic inlining candidates.
final classes can be faster.
- Synchronized methods are slower than the identical non-synchronized one.
- Consider using non-synchronized classes and synchronized-wrappers.
- Access to private members of inner classes from the enclosing class goes by a method call even if not intended to.
- Use StringBuffer instead of the '+' String concatentation operator.
- Use
char[] arrays directly to create Strings rather than StringBuffers.
- '==' is faster than equals().
- intern() Strings to enable identity (==) comparisons.
- Convert strings to char[] arrays to process characters, rather than accessing characters one at a time using String.charAt().
- Creating Doubles from strings is slow.
- Buffer i/o.
- MessageFormat is slow.
- Reuse objects.
- File information such as File.length() requires a system call and can be slow.
- Use System.arraycopy() to copy arrays.
- ArrayList is faster than Vector.
- Preset array capacity to as large as will be required.
- LinkedList is faster than ArrayList for inserting elements to the front of the array, but slower at indexed lookup.
- Program using interfaces so that the actual structure can be easily swapped to improve performance.
- Use the -g:none option to the javac compiler.
- Primitive data wrapper classes (e.g. Integer) are slower than using the primitive data directly.
- Null out references when they are no longer used so that garbage collection can reclaim their space.
- Use SoftReferences to recycle memory when required.
- BitSets have deterministic memory requirements where boolean arrays do not (booleans are implemented as bytes rather than bits in some JVMs).
- Use sparse arrays to hold widely spaced indexable data.
http://www.sun.com/solaris/java/wp-java/6.html
Performance tuning part of a white paper about Java on Solaris 2.6. (Page last updated 2000, Added 2000-10-23, Author ?, Publisher Sun). Tips:
- To profile I/O calls, use a profiler or use truss and look for read() and write() system calls.
- Buffer I/O. Tune the buffer size (bigger is usually better if memory is available).
- Use char arrays for all character processing in loops, rather than using the String or StringBuffer classes.
- Avoid character processing using methods (e.g. charAt(), setCharAt()) inside a loop.
- Set the initial StringBuffer size to the maximum string length, if it is known.
- StringTokenizer is very inefficient, and can be optimized by storing the string and delimiter in a character array instead of in String, or by storing the highest delimiter character to allow a quicker check.
- Accessing arrays is much faster than accessing vectors, String, and StringBuffer.
- Use System.arraycopy() to improve performance.
- Vector is convenient to use, but inefficient. Ensure that elementAt() is not used inside a loop.
- FastVector is faster than Vector by making the elementData field public, thus avoiding (synchronized) calls to elementAt().
- Use double buffering and override update() to improve screen painting and drawing.
- Use custom LayoutManagers.
- Repaint only the damaged regions (use ClipRect).
- To improve image handling: use MediaTracker; use your own imageUpdate() method; pre-decode and store the image in an array - image decoding time is greater than loading time. Pre-decoding using PixelGrabber and MemoryImageSource should combine multiple images into one file for maximum speed.
- Increase the initial heap size from the 1-MByte default with -ms and -mx [-Xms and -Xmx].
- Use -verbosegc.
- Take size into account when allocating arrays (for instance, if short is big enough, use it instead of int.
- Avoid allocating objects in loops (readLine() is a common example).
- Minimize synchronization.
- Polling is only acceptable when waiting for outside events and should be performed in a "side" thread. Use wait/notify instead.
- Move loop invariants outside the loop.
- Make tests as simple as possible.
- Perform the loop backwards (this actually performs slightly faster than forward loops do). [Actually it is converting the test to compare against 0 that makes the difference].
- Use only local variables inside a loop; assign class fields to local variables before the loop.
- Move constant conditionals outside loops.
- Combine similar loops.
- Nest the busiest loop, if loops are interchangeable.
- Unroll the loop, as a last resort.
- Convert expressions to table Lookups.
- Use caching.
- Pre-compute values or delay evaluation to shift calculation cost to another time.
- [Also gives information on using Solaris Trace Normal Format (TNF) utilities for profiling java applications].
http://www.javareport.com/html/from_pages/article.asp?id=252
Detailed article on load testing systems (Page last updated January 2001, Added 2001-01-19, Author Himanshu Bhatt, Publisher Java Report). Tips:
- Internet systems should be load-tested throughout development.
- Load testing can provide the basis for: Comparing varying architectural approaches; Performance tuning; Capacity planning.
- Initially you should identify the probable performance and scalability based on the requirements. You should be asking about: numbers of users/components; component interactions; throughput and transaction rates; performance requirements.
- Factor in batch requirements and performance characteristics of dependent (sub)systems. Note that additional layers, like security, add overheads to performance.
- Logging and stateful EJB can degrade performance.
- After the initial identification phase, the target should be for a model architecture that can be load-tested to feedback information.
- Scalability hotspots are more likely to exist in the tiers that are shared across multiple client sessions.
- Performance measurements should be from presentation start to presentation completion, i.e. user clicks button (start) and information is displayed (completion).
- Use load-test suites and frameworks to perform repeatable load testing.
http://www.devx.com/free/articles/2000/maso01/maso01-1.asp
Article on using syslog to track performance across distributed systems (Page last updated December 2000, Added 2001-01-19, Author Brian Maso, Publisher DevX). Tips:
- Use syslog to log distributed system performance.
- Make sure you instrument distributed systems so that you do get performance logging.
http://www.as400.ibm.com/developer/java/topics/jdbctips.html
JDBC Performance Tips (targeted at AS/400, but generically applicable) (Page last updated February 2001, Added 2001-03-21, Authors Richard Dettinger and Mark Megerian, Publisher IBM). Tips:
- Move to the latest releases of Java as they become available.
- Use prepared statements (PreparedStatement class) [article provides coded example of using Statement vs. PreparedStatement].
- Note that two database calls are made for each row in a ResultSet: one to describe the column, the second to tell the db where to put the data. PreparedStatements make the description calls at construction time, Statements make them on every execution.
- Avoid retrieving unnecessary columns: don't use "SELECT *".
- If you are not using stored procedures or triggers, turn off autocommit. All transaction levels operate faster with autocommit turned off, and doing this means you must code commits. Coding commits while leaving autocommit on will result in extra commits being done for every db operation.
- Use the appropriate transaction level. Increasing performance costs for transaction levels are: TRANSACTION_NONE; TRANSACTION_READ_UNCOMMITTED; TRANSACTION_READ_COMMITTED; TRANSACTION_REPEATABLE_READ; TRANSACTION_SERIALIZABLE. Note that TRANSACTION_NONE, with autocommit set to true gives access to triggers, stored procedures, and large object columns.
- Store string and char data as Unicode (two-byte characters) in the database.
- Avoid expensive database query functions such as: getBestRowIdentifier; getColumns; getCrossReference; getExportedKeys; getImportedKeys; getPrimaryKeys; getTables; getVersionColumns.
- Use connection pooling, either explicitly with your own implementation, or implicitly via a product that supports connection pooling.
- Use blocked fetchs (fetching table data in blocks), and tailor the block size to reduce calls to the database, according to the amount of data required.
- Use batch updates (sending multiple rows to the database in one call).
- Use stored procedures where appropriate. These benefit by reducing JDBC complexity, are faster as they use static SQL, and move execution to the server and potentially reduce network trips.
- Use the type-correct get() method, rather than getObject().
http://www.patrick.net/jpt/index.html
Patrick Killelea's Java performance tips. (Page last updated 1999, Added 2000-10-23, Author Patrick Killelea, Publisher Killelea). Tips:
- System.currentTimeMillis may take up to 0.5 milliseconds to execute.
- The architecture and algorithms of your program are much more important than any low-level optimizations you might perform.
- Tune at the highest level first.
- Make the common case fast (Amdahl's advice).
- Use what you know about the runtime platform or usage patterns.
- Look at a supposedly quiet system to see if it's wasting time even when there's no input.
- Keep small inheritance chains.
- Use stack (local) variables in preference to class variables.
- Merge classes.
- drawPolygon() is faster than using drawLine() repeatedly.
- Don't create too may objects.
- Reuse objects if possible.
- Beware of object leaks (references to objects that are never nulled).
- Accessor methods increase overhead.
- Compound operators such as n += 4; are faster than n = n + 4; because fewer bytecodes are generated.
- Shifting by powers of two is faster than multiplying.
- Multiplication is faster than exponentiation.
- int increments are faster than byte or short increments.
- Floating point increments are much slower than any integral increment.
- Memory access from better to worse: local vars; supersuperclass instance variable; superclass instance var; class instance var; class static var; array elements.
- It can help to copy slower-access vars to fast local vars if you are going to operate on them repeatedly, as in a loop.
- Use networking timeouts, TCP_NODELAY, SO_TIMEOUT, especially in case of dying DNS servers.
- Buffer network io. [or read explicitly in chunks].
- Avoid reverse DNS where you can.
- Use UDP rather than TCP if speed is more important than accuracy.
- Use threads. Prioritize threads. Use notify instead of notifyAll. Use synchronization sparingly.
- Counting down is often faster than counting up. [the loop test comparison to 0 is what matters].
- Keep synchronized methods out of loops if you possibly can.
- Avoid excessive String manipulation.
- Use String Buffers or Arrays rather than String.
- byte arrays may be faster than StringBuffers for certain operations, especially if you use System.arraycopy().
- Use StringBuffer rather than the + operator.
- Watch out for slow fonts, Fonts vary in speed of rendering.
- Keep the paint method small. It will get called a lot.
- Double buffer where possible.
- For some applications that access the date a lot, it can help to set the local timezone to be GMT, so that no conversion has to take place.
- Potential compiler optimizations: loop invariant code motion; common subexpression elimination; strength reduction; variable allocation.
- Don't turn off native threads.
- Use .jar files.
- Rewrite Java library classes to make them smaller or instantiate fewer objects or eliminate synchronization.
- Install classes locally.
http://java.sun.com/docs/books/tutorial/extra/fullscreen/
Tutorial on the full screen capabilities in the 1.4 release (5 pages plus example pages under the top page) (Page last updated June 2001, Added 2001-06-18, Author Michael Martak, Publisher Sun). Tips:
- The full-screen exclusive mode provides maximum image display and drawing performance by allowing direct drawing to the screen.
- Use java.awt.GraphicsDevice.isFullScreenSupported() to determine if full-screen exclusive mode is available. If it is not available, full-screen drawing can still be used, but better performance will be obtained by using a fixed size window in normal screen mode. Full-screen exclusive applications should not be resizable.
- Turn off decoration using the setUndecorated() method.
- Change the screen display mode (size, depth and refresh rate), to the best match for your image bit depth and display size so that scaling and other image alterations can be avoided or minimized.
- Don't define the screen painting code in the paint() method called by the AWT thread. Define your own rendering loop for screen drawing, to be executed in any thread other than the AWT thread.
- Use the setIgnoreRepaint() method on your application window and components to turn off all paint events dispatched from the operating system completely, since these may be called during inappropriate times, or worse, end up calling paint, which can lead to race conditions between the AWT event thread and your rendering loop.
- Do not rely on the update or repaint methods for delivering paint events.
- Do not use heavyweight components, since these will still incur the overhead of involving the AWT and the platform's windowing system.
- Use double buffering (drawing to an off-screen buffer, then copying the finished drawing to the screen).
- Use page-flipping (changing the video pointer so that an off-screen buffer becomes the on-screen buffer, with no image copying required).
- Use a flip chain (a sequence of off-screen buffers which the video pointer successively points to one after the other).
- java.awt.image.BufferStrategy provides getDrawGraphics() (to get an off-screen buffer) and show() (to display the buffer on screen).
- Use java.awt.BufferCapabilities to customize the BufferStrategy for optimizing the performance of your application.
- If you use a buffer strategy for double-buffering in a Swing application, you probably want to turn off double-buffering for your Swing components,
- Multi-buffering is only useful when the drawing time exceeds the time spent to do a show.
- Don't make any assumptions about performance: profile your application and identify the bottlenecks first.
http://www.devresource.hp.com/JavaATC/JavaPerfTune/index.html
HP Java tuning site, including optimizing Java and optimizing HPUX for Java. This is the top page, but several useful pages lie off it (tips extracted for inclusion below). Includes a nice "procedure" list for tuning apps, and some useful forms for what you should record while tuning. (Page last updated 2000, Added 2000-10-23, Author ?, Publisher HP). Tips:
- Have a performance target.
- Consider architecture and components for bottlenecks.
- Third-party components may have options that cause bottlenecks.
- Having debugging turned on can cause performance problems.
- Having logging turned on can cause performance problems.
- Is the underlying machine powerful enough.
- Carefully document any tests and changes.
- Create a performance baseline.
- Make one change at a time.
- Be careful not to lose a winning tune because it's hidden by a bad tune made at the same time.
- Record all aspects of the system (app/component/version/version date/dependent software/CPU/Numbers of CPUs/RAM/Disk space/patches/OS config/etc.)
- Give the JVMs top system priority.
- Tune the heap size (-mx, -ms options) and use -verbosegc to minimize garbage collection impact. A larger heap reduces the frequency of garbage collection but increases the length of time that any particular garbage collection takes.
- Rules of thumbs are: 50% of free space available after a gc; set the maximum heap size to be 3-4 times the space required for the estimated maximum number of live objects; set the initial heap to size a little below the space required for the average data set, and the maximum value large enough to handle the largest data set; increase -Xmn for applications that create many short-lived objects [is -Xmn a standard option?]. [These rules of thumb should only be considered as starting points. Ultimately you need to tune the VM heap empirically, i.e. by trial and error].
- You may need to add flags to third party products running in the JVM to eliminate explicit calls to garbage collect (VisiBroker has this known problem).
- Watch out for bottlenecks introduced from third party products. Make sure you know and use the options available, many of which can affect performance (for better or worse). Document the changes you make so that you will be able to reproduce the performance.
- computationally intensive applications should increase the number of CPUs to increase overall system performance and throughput.
- Be certain that the application's CPU usage is a factor limiting performance: often, highly contended locks and garbage collections that are too frequent will make the system look busy, but little work is done by the application.
- [Some nice detailed description on how to profile and analyze application problems, from the HP system and JVM level at http://www.devresource.hp.com/JavaATC/JavaPerfTune/symptoms_solutions.html.]
http://www.sys-con.com/java/article.cfm?id=671
J2EE Application server performance (Page last updated April 2001, Added 2001-04-20, Author Misha Davidson, Publisher Java Developers Journal). Tips:
- Good performance has sub-second latency (response time) and hundreds of (e-commerce) transactions per second.
- Avoid n-way database joins: every join has a multiplicative effect on the amount of work the database has to do. The performance degradation may not be noticeable until large datasets are involved.
- Avoid bringing back thousands of rows of data: this can use a disproportionate amount of resources.
- Cache data when reuse is likely.
- Avoid unnecessary object creation.
- Minimize the use of synchronization.
- Avoid using the SingleThreadModel interface for servlets: write thread-safe code instead.
- ServletRequest.getRemoteHost() is very inefficient, and can take seconds to complete the reverse DNS lookup it performs.
- OutputStream can be faster than PrintWriter. JSPs are only generally slower than servlets when returning binary data, since JSPs always use a PrintWriter, whereas servlets can take advantage of a faster OutputStream.
- Excessive use of custom tags may create unnecessary processing overhead.
- Using multiple levels of BodyTags combined with iteration will likely slow down the processing of the page significantly.
- Use optimistic transactions: write to the database while checking that new data is not be overwritten by using WHERE clauses containing the old data. However note that optimistic transactions can lead to worse performance if many transactions fail.
- Use lazy-loading of dependent objects.
- For read-only queries involving large amounts of data, avoid EJB objects and use JavaBeans as an intermediary to access manipulate and store the data for JSP access.
- Use stateless session EJBs to cache and manage infrequently changed data. Update the EJB occasionally.
- Use a dedicated session bean to perform and cache all JNDI lookups in a minimum number of requests.
- Minimize interprocess communication.
- Use clustering (multiple servers) to increase scalability.
http://www.javaworld.com/javaworld/jw-04-2001/jw-0406-syslog.html
Using the Syslog class for logging (Page last updated April 2001, Added 2001-04-20, Author Nate Sammons, Publisher JavaWorld). Tips:
- Use Syslog to log system performance.
- Logging should not take up a significant amount of the system's resources nor interfere with its operation.
- Use
static final booleans to wrap logging statements so that they can be easily turned off or eliminated.
- Beware of logging to slow external channels. These will slow down logging, and hence the application too.
http://developer.java.sun.com/developer/technicalArticles/Programming/PerfTuning/
Glen McCluskey's article on tuning Java I/O performance. Weak on serialization tuning. (Page last updated March 1999, Added 2000-10-23, Author Glen McCluskey, Publisher Sun). Tips:
- Avoid accessing the disk.
- Avoid accessing the underlying operating system.
- Avoid method calls.
- Avoid processing bytes and characters individually.
- Use buffering either at the class level or at the array level.
- Disable line buffering.
- MessageFormat is slow.
- Reuse objects.
- Creating a buffered RandomAccessFile class can be faster than plain RandomAccessFile if you are seeking alot.
- Compression can help I/O, but only sometimes.
- Use caching to speed I/O.
- Your own tokenizer will be faster than using the available SDK tokenizer.
- Many java.io.File methods are system calls which can be slow.
http://developer.java.sun.com/developer/technicalArticles/ebeans/ejbperformance/
Designing Entity Beans for Improved Performance (Page last updated March 2001, Added 2001-03-21, Author Beth Stearns, Publisher Sun). Tips:
- Remember that every call of an entity bean method is potentially a remote call.
- Designing with one access method per data attribute should only be used where remote access will not occur, i.e. entities are guaranteed to be in the same container.
- Use a value object which encapsulates all of an entity's data attributes, and which transfers all the data in one network transfer. This may result in large objects being transferred though.
- Group entity bean data attributes in subsets, and use multiple value objects to provide remote access to those subsets.
http://www.bastie.de/resource/res/mjp.pdf and http://www.bastie.de/java/mjperformance/contents.html
Performance tuning report in German. Thanks to Peter Kofler for extracting the tips. (Page last updated November 2001, Added 2001-07-20, Author Sebastian Ritter, Publisher Ritter). Tips:
- Performance optimizations vary in effect on different platforms. Always test for your platforms.
- Reasons not to optimize: can lead to unreadable source code; can cause new errors; optimizations are often compiler/JVM/platform dependent; can lose object orientation.
- Reasons to optimize: application uses too much memory/processor/I/O; application is unnaceptably slow.
- Don't optimize before you have at least a functioning prototype and some identified bottlenecks.
- Try to optimize the design first before targeting the implementation.
- Profile applications. Use the 80/20 rull which suggests that 80% of the work is done in 20% of the code.
- Target loops in particular.
- Monitor running applications to maintain performance.
- Plan and budget for some resources to optimize the application. Try to have or develop a couple of performance experts.
- Specify performance in the project requirements, and specify seperate performance requirements for the various layers of the application.
- Consider the effects of performance at the analysis stage, and include testing of 3rd party tools.
- Use a benchmark harness to make repeatable performance tests, varying the number of users, data, etc. Use profilers and logging to measure performance and identify performance problems.
- Optimize the runtime system if the optimization does not require alterations to the application design or implementation.
- Test various JVMs and choose the optimal JVM.
- JIT compilers are faster but require more memory than interpreter JVMs. HotSpot can provide better performance and a faster startup and maintain a relatively low memory requirement.
- Design in asynchronous operations so tasks are not waiting for others to finish when they don't need to.
- use the right VM
- use the right threading model (native vs. green)
- use native compilers
- give more ram to the VM
- give all ram to short-lived applications to completely avoid GC
- use alternate/optimizing compilers
- use the right database driver
- use direct JDBC drivers
- expand all JDK classes into the filesystem to increase access to classes
- use slot-local variables (1st 128 bit = 4 slots) (applies for interpreters only)
- use int
- use Arraylist instead of Vector
- use own Hashtable implementations for primitives (i.e. int)
- use caches
- use object pools
- avoid remote method calls
- use callbacks to avoid blocking remote method calls
- use batching for remote method calls
- use the flyweight pattern to reduce object creation [The flyweight pattern uses a factory instead of 'new' to reuse objects rather than always create new ones].
- use the right access modifier: static > private > final > protected > public
- use inlining
- use shallow hierarchies (to avoid long instantiation chains)
- use empty default constructors
- use direct variable access (not recommended, breaks OO)
- mix model with view (not recommended, breaks OO)
- use better algorithms
- remove redundant code
- optimize loops
- unroll loops
- use int as loop counter
- count/test loops towards 0
- use Exception terminated loops for long loops
- use constants for expressions with known results, e.g. replace
x = 3; ... (x does not change) ...; x += 3; with x = 3; ... (x does not change) ...; x = 6;
- move code outside loops
- how to optimize: 1st check for better algorithms, 2nd optimize loops
- use shift for *2 and /2
- do not initialize with default values (0, null)
- use char arrays for mutable Strings
- use arrays instead of collections
- use the "private final" modifier
- use System.arraycopy() to copy arrays
- use Hashtable keys with fast hashcode()
- do not use Strings as keys for Hashtables
- use new Hashtable() instaed of Hashtable.clear() for very large Hashtables
- inspect JDK source
- use methods in order: static > final > instance > interface > synchronized
- use own specialized methods instead of JDK's generalized ones
- avoid synchronization
- avoid new objects
- reuse objects
- use the original instead of overloaded constructors (give default parameters by your own)
- avoid inner classes
- use + for concenating 2 Strings, use Stringbuffer for concenating more Strings
- use clone to create new objects (instead of new)
- use instance.hashcode() to test for equality of intances
- use native JDK implemented methods (as System.arraycopy())
- avoid Exceptions (use Exceptions only for cases with probability < 50%, else use error flags)
- combine multiple small try-catchs to one larger block
- use Streams instead of Readers, use Reader and Writer only if you need internationalization
- use buffering for io
- use EOFException and ArrayOutOfBoundsException for terminating io reading loops
- use transient fields to speedup serialisation
- use externalization instead of serialisation
- use multiple threads to increase perceived performance
- use awt instead of swing for speed
- use swing instead of awt for less memory
- use super.paint() to initially draw something (i.e. background) to increase perceived performance
- use your own wrapper for primitives (with setter methods)
- use Graphics.drawPolygon() (native implemented) instead of several Graphics.drawlines().
- use low priority threads to initialize graphic components in the background
- use synchronized blocks instead of synchronized methods
- cache (SQL) Statements for DB access
- use PreparedStatements for DB access
http://java.sun.com/features/2002/03/swinggui.html
Accelerating GUI apps (after 1.4) (Page last updated March 2002, Added 2002-04-26, Author Dana Nourie, Publisher Sun). Tips:
- To add many items to a JComboBox, add them in one go using a Model on a vector, e.g. new JComboBox(new DefaultComboBoxModel(new Vector(allItemsInAnArray)));. This generates only one changed event.
- Perform GUI operations in bulk to minimize the events generated.
- When initializing or totally replacing the contents of a model, construct a new one instead of reusing the existing one to minimize generated events.
- Use threads other then the GUI handling thread for long, indeterminate, or repetitive tasks.
- VolatileImage allows you to create a hardware-accelerated offscreen image and manage the contents of that image.
- From 1.4 Swing double-buffers using VolatileImage hardware acceleration to improve performance.
- Repaint small regions instead of entire sections or screens. For instance, when using tables, repaint a single table cell as needed instead of repainting the entire screen or table.
- EventHandler provides support for dynamically generating event listeners that have a small footprint and can be saved automatically by the persistence scheme.
http://developer.java.sun.com/developer/J2METechTips/2002/tt0325.html
MIDP tips (Page last updated March 2002, Added 2002-04-26, Author Eric Giguere, Publisher Sun). Tips:
- Make HTTP requests in a background thread.
- Use an asynchronous messaging model.
- Use WBXML to compress XML messages.
http://www.javaworld.com/javaworld/jw-09-1996/jw-09-indepth.html
Article about avoiding creating objects where possible. (Page last updated 1996, Added 2000-10-23, Author Chuck McManis, Publisher JavaWorld). Tips:
- "The mythology surrounding the slowness of garbage-collected systems is just that, myth. I can show that the number of instructions executed is the same whether I call malloc() and free() or I only call malloc() and some other code calls free()."
- Simple designs can easily run through many unnecessary objects, e.g. data wrapper objects like Integer.
- Reuse objects where possible.
- Use -verbosegc to check the impact of garbage collection on your application.
http://java.sun.com/people/jag/Fallacies.html
The Eight Fallacies of Distributed Computing (Page last updated 2000, Added 2002-03-25, Author Peter Deutsch, Publisher Sun). Tips:
- The network can fail to deliver at any time.
- Latency is significant.
- Bandwidth is always limited.
http://www.javaworld.com/javaworld/jw-01-2001/jw-0112-performance.html
Article on designing for performance focusing on interfaces (Page last updated January 2001, Added 2001-02-21, Author Brian Goetz, Publisher JavaWorld). Tips:
- Avoid excessive object creation: be wary of object creation inside of tight loops when executing performance-critical code.
- Performance-conscious programmers avoid excessive use of String.
- Defining a utility class which is applied to data required by its constructor means that you must create a new object for every piece of data to run it on. Instead, do not require data in the constructor.
- Do not force methods to provide arguments with input in the form that is convenient rather than efficient. For example, don't require that arguments be passed only as String objects if a byte array or char array would also be functionally equivalent (try to support all formats, especially the efficient ones).
- Defining a method signature in terms of an interchange type (the type of object passed from a caller method to the callee method as an argument) reduces the interface's complexity while maintaining its flexibility, but sometimes this simplicity comes at the cost of performance.
http://java.sun.com/docs/hotspot/PerformanceFAQ.html
HotSpot FAQ (Page last updated August 2000, Added 2001-02-21, Author ?, Publisher Sun). Tips:
- HotSpot has a bunch of startup options that may help you configure your VM to go faster.
- HotSpot garbage collection parameters can be tuned with -Xincgc, -XX:NewSize, -XX:MaxNewSize and -XX:SurvivorRatio(and heap size parameters).
- Sun recommends you no longer use objects pools [this is rather a sweeping and inappropriate statement. Object pools are still useful even with HotSpot, but presumably not as often as previously].
- Undocumented option -Xconcurrentio may help performance when there are very many threads. It uses a lighter thread synchronization model.
- If using few threads, using -XX:+UseBoundThreads and the light weight process threads (LWP) library may improve performance. LWP threads are scheduled by the JVM, system threads have kernel scheduling.
- Don't call System.gc().
- Warming loops is no longer necessary from HotSpot 2.0 (SDK 1.3). HotSpot now supports on-stack-replacement.
- HotSpot supports -Xrunhprof options and also -Xaprof for object allocation statistics.
- Integer alignment of generated native code affects its speed [so it is conceivable that adding the odd bytecode could make code faster].
- HotSpot can eliminate "dead variables" and dead code, i.e. variables that are assigned to but never used [in isolated code segments].
- The generational-GC per object costs varies depending on the length of life of the object.
http://www.unixsolutions.hp.com/products/java/perf.html
A different HP tip page on optimizing Java performance, from the "HP-UX Programmer's Guide for Java". Gives info on HP system performance monitoring too (Page last updated ?, Added 2000-10-23, Author ?, Publisher HP). Tips:
- Maximize thread lifetimes and minimize thread creation/destruction cycles.
- Minimize contention for shared resources.
- Minimize creation of short-lived objects.
- Use -verbosegc to monitor garbage collection. Tune the applications to minimize the effects of garbage collections.
- Disk I/O should be minimized. Don't do random I/O to read a file serially (RandomAccessFile class). You should use buffered I/O.
- Complex AWT graphics will slow down your performance.
- Use the most current version of Java.
- Use -mx and -ms to tune the heap size [now -Xms and -Xmx].
- Profile the code to find bottlenecks.
http://www.artima.com/designtechniques/hotspot.html
Bill Venners on "the right way to optimize" (Page last updated May 1998, Added 2000-10-23, Author Bill Venners, Publisher Artima). Tips:
- Don't optimize until you know you have a problem.
- Measure the program before and after your optimization efforts.
- Profile the program to isolate the code that really matters to performance (10 to 20 percent), and just focus your optimization efforts there.
- Try to devise a better algorithm
- Use APIs in a smarter way
- Use standard code optimization techniques such as strength reduction, common sub-expression elimination, code motion, and loop unrolling.
- Only as a last resort should you sacrifice good object-oriented, thread-safe design and maintainable code in the name of performance.
- Make methods static wherever possible.
- Avoid creating lots of short-lived objects
http://www.informit.com/content/index.asp?product_id={11E331A5-5A08-4FFD-B018-2A7E24D0359B}
Application performance tuning (Page last updated July 2002, Added 2002-07-24, Author Baya Pavliashvili and Kevin Kline, Publisher informIT). Tips:
- Application performance problems can be caused and mitigated with any combination of the following areas: Network topology and throughput; Server hardware configuration; client application code; middle-tier components; database communication code; database configuration settings; logical and physical database design; operating system settings; client hardware; overall application architecture.
- Monitor the application. Primary statistics worth analyzing are: the number of concurrent users; number of transactions per unit of time; duration of the longest and shortest transactions; and the average response time.
- Specify the performance targets.
- Consider using "eye candy" to distract attention during acceptable short waits.
- Identify which application tier contains the bottleneck and fix that. It might be hardware or software; low-level or architecture.
- Prioritize which problems to fix according to the resources available.
http://www.javaworld.com/javaworld/jw-11-1999/jw-11-performance.html
Object management article (Page last updated November 1999, Added 2000-12-20, Author Dennis M. Sosnoski, Publisher JavaWorld). Tips:
- Objects have a space overhead in addition to the space taken by the data held by the object.
- Objects have a space overhead in addition to the space taken by the data held by the object. The overhead is dependent on the particular JVM, but there is always some. The space overhead is a per object value, so the percentage of overhead decreases with larger objects. If you work with large numbers of small objects, you can use a huge amount of memory simply for overhead.
- Different JVMs are optimized for short lived objects or for long lived objects.
- Object creation and garbage collection have significant overheads.
- Providing you're sensible about creating objects in heavily used code, it's easy to avoid the object churn cycle.
- The easiest way to reduce object creation in your programs is by using primitive types in place of objects.
- Avoid using wrapper classes (for primitive data types, e.g. Integer) as they impose extra overheads.
- If you're working with a large number of primitive data types, you can avoid the excessive object overhead of wrappers by storing and passing values of the underlying primitive types, and only converting the values into the full objects when necessary for use with methods in the class libraries.
- Avoid convenience classes like Point if you can manage the underlying data directly.
- Reuse objects where possible.
- Use object pools where this is helpful in reusing objects, but be careful that the pool implementation does actually give a performance improvement (dedicated pools within the class can be significantly faster than abstract pool implementations).
- Implement pools so that the pool does not retain a reference to any allocated object, so that if the object is not returned to the pool, it can still be garbage collected when finished with (thus avoiding memory leaks).
http://cin.earthweb.com/public/article/0,,10493_1145241,00.html
Website usability metrics (Page last updated May 2002, Added 2002-07-24, Author Sharon Gaudin, Publisher EarthWeb). Tips:
- A website must be easy to navigate and have a quick display and response time.
- Bad navigation metrics include: abandoned shopping carts; first time visitors look at one or two pages and disappear; dead ends require the "back" button; less than 5% buy something; any broken links.
- Good navigation metrics include: three pages or less from wesbite entry to desired information; no streaming video or Flash introductions; multiple ways to reach the required information; up to date search engines; basic compancy and contact info one click away from the homepage.
http://itmanagement.earthweb.com/ecom/article/0,,11952_1370691,00.html
Common issues affecting Web performance (Page last updated June 2002, Added 2002-07-24, Author Drew Robb, Publisher EarthWeb). Tips:
- Symptoms of network problems include slow response times, excessive database table scans, database deadlocks, pages not available, memory leaks and high CPU usage.
- Causes of performance problems can include the application design, incorrect database tuning, internal and external network bottlenecks, undersized or non-performing hardware or Web and application server configuration errors.
- Root causes of performance problems come equally from four main areas: databases, Web servers, application servers and the network, with each area typically causing about a quarter of the problems.
- The most common database problems are insufficient indexing, fragmented databases, out-of-date statistics and faulty application design. Solutions include tuning the index, compacting the database, updating the database and rewriting the application so that the database server controls the query process.
- The most common network problems are undersized, misconfigured or incompatible routers, switches, firewalls and load balancers, and inadequate bandwidth somewhere along he communication route.
- The most common application server problems are poor cache management, unoptimized database queries, incorrect software configuration and poor concurrent handling of client requests.
- The most common web server problems are poor design algorithms, incorrect configurations, poorly written code, memory problems and overloaded CPUs.
- Having a testing environment that mirrors the expected real-world environment is very important in achieving good performance.
- The deployed system needs to be tested and continually monitored.
http://www.sys-con.com/java/article.cfm?id=1533
The smallest "Hello World" (Page last updated July 2002, Added 2002-07-24, Author Norman Richards, Publisher Java Developers Journal). Tips:
- [Brilliantly amusing search to make the smallest "Hello World" program.]
- Use the -g:none option to strip debugging bytes from classfiles.
- Most bytes in Java class files are from the constant pool, then the method declarations. The constant pool includes class and method names as well as strings.
- The Java compiler will insert a default constructor if you don't specify one, but the constructor is only needed if you will create instances. You can remove the constructor if you will not be creating instances.
- Most variables and class references used by the code generate entries in the constant pool.
- Reusing already existing constant pool entries for class/method/variable names reduces the class file size.
http://www.javaworld.com/javaworld/jw-11-2000/jw-1110-smartproxy.html
Article on using smart proxies. (Page last updated November 2000, Added 2001-01-19, Author M. Jeff Wilson, Publisher JavaWorld). Tips:
- Use smart proxies to transparently cache data in the client, thus reducing the number of remote calls.
- Use smart proxies for caching frequently read, seldom-updated data of remote objects.
- Use smart proxies to monitor the performance of RMI calls.
- Use smart proxies to prevent returning multiple copies of the same remote object to client code.
http://www-4.ibm.com/software/webservers/appserv/ws_bestpractices.pdf
Paper detailing the "Best Practices for Developing High Performance Web and Enterprise Applications" using IBM's WebSphere. All the tips are generally applicable to servlet/EJB development, as well as other types of server development. (Page last updated September 2000, Added 2001-01-19, Author Harvey W. Gunther, Publisher IBM). Tips:
- Do not store large object graphs in javax.servlet.http.HttpSession. Servlets may need to serialize and deserialize HttpSession objects for persistent sessions, and making them large produces a large serialization overhead.
- Use the tag "<%@ page session="false"%>" to avoid creating HttpSessions in JSPs.
- Minimize synchronization in Servlets to avoid multiple execution threads becoming effectively single-threaded.
- Do not use javax.servlet.SingleThreadModel.
- Use JDBC connection pooling, release JDBC resources when done, and reuse datasources for JDBC connections.
- Use the HttpServlet Init method to perform expensive operations that need only be done once.
- Minimize use of System.out.println.
- Avoid String concatenation "+=".
- Access entity beans from session beans, not from client or servlet code.
- Reuse EJB homes.
- Use Read-Only methods where appropriate in entity-beans to avoid unnecessary invocations to store.
- Use the lowest impact transaction level possible for each transaction.
- The EJB "remote programming" model always assumes EJB calls are remote, even where this is not so. Where calls are actually local to the same JVM, try to use calling mechanisms that avoid the remote call.
- Remove stateful session beans (and any other unneeded objects) when finished with, to avoid extra overheads in case the container needs to be passivated.
- Beans.instantiate() incurs a filesystem check to create new bean instances. Use "new" to avoid this overhead.
http://www-4.ibm.com/software/webservers/appserv/3steps_perf_tuning.pdf
Tuning IBM's WebSphere product. White paper: "Methodology for Production Performance Tuning". Only non-product specific Java tips have been extracted here. (Page last updated September 2000, Added 2001-01-19, Author Gennaro (Jerry) Cuomo, Publisher IBM). Tips:
- A size restricted queue (closed queue) allows system resources to be more tightly managed than an open queue.
- The network provides a front-end queue. A server should be configured to use the network queue as its bottleneck, i.e. only accept a request from the network when there are sufficient resources to process the request. This reduces the load on an app server. However, sufficient requests should be accepted to ensure that the app server is working at maximum capacity, i.e. try not to let a component sit idle while there are still requests that can be accepted even if other components are fully worked.
- Try to balance the workload of the various components.
- [Paper shows a nice throughput curve giving recommended scaling behavior for an server]
- The desirable target bottleneck is the CPU, i.e. a server should be tuned until the CPU is the remaining bottleneck. Adding CPUs is a simple remedy to this.
- Use connection pools and cached prepared statements for database access.
- Object memory management is particularly important for server applications. Typically garbage collection could take between 5% and 20% of the server execution time. Garbage collection statistics provide a useful monitor to determine the server's "health". Use the verbosegc flag to collect basic GC statistics.
- GC statistcs to monitor are: total time spent in GC (target less than 15% of execution time); average time per GC; average memory collected per GC; average objects collected per GC.
- For long lived server processes it is particularly important to eliminate memory leaks (references retained to objects and never released).
- Use -ms and -mx to tune the JVM heap. Bigger means more space but GC takes longer. Use the GC statistics to determine the optimal setting, i.e the setting which provides the minimum average overhead from GC.
- The ability to reload classes is typically achieved by testing a filesystem timestamp. This check should be done at set intermediate periods, and not on every request as the filesystem check is an expensive operation.
http://www.redbooks.ibm.com/abstracts/sg245657.html
WebSphere V3 Performance Tuning Guide (Page last updated March 2000, Added 2001-01-19, Authors Ken Ueno, Tom Alcott, Jeff Carlson, Andrew Dunshea, Hajo Kitzhöfer, Yuko Hayakawa, Frank Mogus, Colin D. Wordsworth, Publisher IBM). Tips:
- [The Red book lists and discusses tuning parameters available to Websphere]
- Run an application server and any database servers on separate server machines.
- JVM heap size: -mx, -ms [-Xmx, -Xms]. As a starting point for a server based on a single JVM, consider setting the maximum heap size to 1/4 the total physical memory on the server and setting the minimum to 1/2 of the maximum heap. Sun recommends that ms be set to somewhere between 1/10 and 1/4 of the mx setting. They do not recommend setting ms and mx to be the same. Bigger is not always better for heap size. In general increasing the size of the Java heap improves throughput to the point where the heap no longer resides in physical memory. Once the heap begins swapping to disk, Java performance drastically suffers. Therefore, the mx heap setting should be set small enough to contain the heap within physical memory. Also, large heaps can take several seconds to fill up, so garbage collection occurs less frequently which means that pause times due to GC will increase. Use verbosegc to help determine the optimum size that minimizes overall GC.
- In some cases turning off asynchronous garbage collection ("-noasyncgc", not always available to all JVMs) can improve performance.
- Setting the JVM stack and native thread stack size (-oss and -ss) too large (e.g. greater than 2MB) can significantly degrade performance.
- When security is enabled (e.g. SSL, password authentication, security contexts and access lists, encryption, etc) performance is degraded by significant amounts.
- One of the most time-consuming procedures of a database application is establishing a connection to the database. Use connection pooling to minimize this overhead.
http://www.javaworld.com/javaworld/jw-02-2001/jw-0216-ternary.html
Using a ternary search tree for fast searches of partial text matches (Page last updated February 2001, Added 2001-03-21, Author Wally Flint, Publisher JavaWorld). Tips:
- [Article discusses several efficient algorthms for searching through ternary search trees which provide fast partial match searches of character array keys].
http://www-106.ibm.com/developerworks/java/library/j-threads1.html
When synchronization is required (Page last updated July 2001, Added 2001-07-20, Author Brian Goetz, Publisher IBM). Tips:
- synchronization means mutual exclusion (if the same monitor is used), atomicity of the synchronized block (again with respect to other threads using the same monitor) and synchronization of thread memory to main memory.
- Because synchronization synchronizes thread memory with main memory, there is a cost to synchronization beyond simply acquiring a lock.
- Too little synchronization can lead to corrupt data; too much can lead to reduced performance and deadlock.
- The costs of synchronization vary with JVMs, with more recent JVMs being more efficient.
- The costs of synchronization differs depending on whether or not threads are actually contending for locks (more expensive, slower), or for uncontended synchronization where the thread is basically acting in single-threaded mode (cheaper, faster).
- You need to synchronize or make
volatile variables holding data that will be shared between threads.
- Composite operations may need synchronizing to make them atomic even if each individual operation is already synchronized.
http://www-106.ibm.com/developerworks/java/library/j-threads2.html
Reducing thread contention (Page last updated September 2001, Added 2001-10-22, Author Brian Goetz, Publisher IBM). Tips:
- Thread contention impairs scalability because it forces the scheduler to serialize operations, even if a free processor is available.
- Analyze your program to determine where contention is likely to occur.
- Make synchronized blocks as short as possible.
- Spread synchronizations over more than one lock.
- [Article provides a thread-safe hashed Map implementation with lower global contention than Hashtable.]
- If you will be acquiring and releasing the same lock many times (such as in a loop), acquire the lock before the loop: it is faster to acquire a lock that you already hold than one that nobody holds.
http://www.onjava.com/pub/a/onjava/2002/04/03/javaenterprise_tips.html
J2EE worst practices (Page last updated April 2002, Added 2002-04-26, Author Brett McLaughlin, Publisher OnJava). Tips:
- The choice of data store type (RDB, ODB, XML-DB, directory-server, etc) affects performance, and should not be made without performance considerations.
- Directory servers are optimized for frequent reads, with few writes. If you frequently add data to a directory server, performance degrades.
- Stateless session beans are soooo much faster.
http://www.javaworld.com/javaworld/jw-12-2001/jw-1207-hprof.html
The hprof profiler (Page last updated December 2001, Added 2001-12-26, Author Bill Pierce, Publisher JavaWorld). Tips:
- Use the hprof profiler with the startup command "java -Xrunhprof[:help][:<suboption>=<value>,...] MyMainClass".
- [Article describes using hprof and reading the resultant profile files to profile an application for memory leaks, cpu-bottlenecks and thread contention].
- hprof can be used to profile object allocation (heap option), method bottlnecks (cpu option) and thread contention (monitor option).
http://www.weblogic.com/docs51/admindocs/tuning.html
Weblogic tuning (generally applicable Java tips extracted) (Page last updated June 2000, Added 2001-03-21, Author BEA Systems, Publisher BEA). Tips:
- Response time is affected by: contention and wait times, particularly for shared resources; and software and hardware component performance, i.e. the amount of time that resources are needed.
- A well-designed application can increase performance by simply adding more resources (for instance, an extra server).
- Use clustered or multi-processing machines; use a JIT-enabled JVM; use Java 2 rather than JDK 1.1;
- Use -noclassgc. Use the maximum possible heap size that also is small enough to avoid the JVM from swapping (e.g. 80% of RAM left over after other required processes). Consider starting with minimum initial heap size so that the garbage collector doesn't suddenly encounter a full heap with lots of garbage. Benchmarkers sometimes like to set the heap as high as possible to completely avoid GC for the duration of the benchmark.
- Distributing the application over several server JVMs means that GC impact will be spread in time, i.e. the various JVMs will most likely GC at different times from each.
- On Java 1.1 the most effective heap size is that which limits the longest GC incurred pause to the longest acceptable pause in processing time. This will typically require a reduction in the maximum heap size.
- Too many threads causes too much context switching. Too few threads may underutilize the system. If n=number of threads, k=number of CPUs, then: (n < k) results in an under utilized CPU; (n == k) is theoretically ideal, but each CPU will probably be under utilized; (n > k) by a "moderate amount of threads" is practically ideal; (n > k) by "many threads" can lead to significant performance degradation from context switching. Blocked threads count for less in the previous formulae.
- Symptoms of too few threads: CPU is waiting to do work, but there is work that could be done; Can not get 100% CPU; All threads are blocked [on i/o] and runnable when you do an execution snapshot.
- Symptoms of too many threads: An execution snapshot shows that there is a lot of context switching going on in your JVM; Your performance increases as you decrease the number of threads.
- If many client connections are dropped or refused, the TCP listen queue may be too short.
- Try to avoid excessive cycling (creation/deletion or activation/passivation) of beans.
http://www.weblogic.com/docs51/techdeploy/jdbcperf.html
Weblogic JDBC tuning (Page last updated April 1999, Added 2001-03-21, Author BEA Systems, Publisher BEA). Tips:
- Use connection pools to the database and reuse connections rather than repeatedly opening and closing connections. Optimal pool size is when the connection pool is just large enough to service requests without waits.
- Cache frequently requested data in the JVM and avoid the unnecessary database requests.
- Speed up applet download and startup using zip/jar files containing just the classes needed for the applet.
- Avoid accessing the database wherever possible.
- Fetch rows in batches rather than one at a time, using the batch as a read-ahead mechanism (i.e. pre-fetch rows in batches). Tune the batch size and the number of rows pre-fetched. Avoid pre-fetching BLOBs.
- Avoid moving data unless absolutely necessary. Process the data and produce results as close to its source as possible. Use stored procedures.
- Streamline data before the result crosses the network.
- Use stored procedures to avoid extra network transfers.
- Use built-in DBMS set-based processing to operate on multiple rows/tables in one request.
- Avoid row at a time processing, process multiple rows together wherever possible.
- Counting entries in a table (e.g. using
SELECT count(*) from myTable, yourTable where ... ) is resource intensive. Try first selecting into temporary tables, returning only the count, and then sending a refined second query to return only a subset of the rows in the temporary table.
- Proper use of SQL can reduce resource requirements. Use queries which return the minimum of data needed: avoid
SELECT * queries. A complex query that returns a small subset of data is more efficient than a simple query that returns more data than is needed.
- Make your queries as smart as possible, i.e. as precise as possible to minimize the data transferred to just that subset that is required.
- Try to batch updates: collect statements together and execute them together in one transaction. Use conditional logic and temporary variables if necessary to achieve statement batching.
- Never let a DBMS transaction span user input.
- Consider using optimistic locking. Optimistic locking employs timestamps to verify that data has not been changed by another user, otherwise the transaction fails.
- Use in-place updates, i.e. change data in rows/tables that already exist rather than adding or deleting rows/tables. Try to avoid moving rows or changing their sizes.
- Store operational data and historic data separately (or more generally store frequently used data separately from infrequently used data).
- Keep your operational data set as small as possible, to avoid having to read through data that is irrelevant.
- DBMSs work well with parallelism. Try to design the application to do other things while interacting with the DBMS.
- Use pipelining and parallelism. Designing applications to support lots of parallel processes working on easily distinguished subsets of the work makes the application faster. If there are multiple steps to processing, try to design your application so that subsequent steps can start working on the portion of data that any prior process has finished, instead of having to wait until the prior process is complete.
- Choose the right driver for your application, i.e. the fastest JDBC driver.
http://www.sys-con.com/websphere/article.cfm?id=40
JDBC optimizing for DB2 (Page last updated April 2002, Added 2002-04-26, Author John Goodson, Publisher WebSphere Developers Journal). Tips:
- Use the same connection to execute multiple statements.
- Keep connection objects open, and reuse them, rather than repeatedly connecting and disconnecting.
- Turn off autocommit, but don't leave transactions open for too long.
- Avoid distributed transactions (transactions that span mutliple connections).
- Minimize the data retrieved from the database, both columns and rows. Use setMaxRows, setMaxFieldSize, and SetFetchSize.
- Use the most efficiently handled data type: character strings are faster than integers, which are in turn more efficient than floating-point and timestamps.
- Use programmatic updates: updateXXX() calls on updatable resultsets. The resultset is already postioned at a row, so eliminating the usual overhead of finding the row to be updated when using an UPDATE statement.
- Cache any required metadata and use metadata methods as rarely as possible as they are quite slow.
- Avoid using null parameters in metadata queries.
- Use a dummy query to get the metadata for a column, rather than use the getcolumns()
- Use parameter markers with stored procedures, rather than embedding data literally in the statement, to minimize parsing overheads.
- Use prepared statements for repeatedly executing SQL statements
- Choose the optimal cursor: forward-only for sequential reads; insensitive for two-way scrolling. Avoid insenstive cursors for queries that only return one row.
http://www.sys-con.com/java/article.cfm?id=1171
J2EE Performance tuning (Page last updated October 2001, Added 2001-10-22, Author James McGovern, Publisher Java Developers Journal). Tips:
- Call HttpSession.invalidate() to clean up a session when you no longer need to use it.
- For Web pages that don't require session tracking, save resources by turning off automatic session creation using: <%@ page session="false"%>
- Implement the HttpSessionBindingListener for all beans that are scoped as session interface and explicitly release resources implementing the method valueUnbound().
- Timeout sessions more quickly by setting the timeout or using session.setMaxInactiveInterval().
- Keep-Alive may be extra overhead for dynamic sites.
- Use the include directive <%@ include file="copyleft.html" %> where possible, as this is a compile-time directive (include action <jsp:include page="copyleft.jsp" /> is a runtime directive).
- Use cache tagging where possible.
- Always access entity beans from session beans.
- If only using an entity bean for data access, use JDBC directly instead.
- Use read-only in the deployment descriptor.
- Cache access to EJB homes.
- Use local entity beans when beans are co-located in the same JVM.
- Proprietary stubs can be used for caching and batching data.
- Use a dedicated remote object to generate unique primary keys.
- Follow standard JDBC optimizations: use connection pools; prefer stored procedures or direct SQL; use type 4 drivers; remove extra columns from the result set; use prepared statements when practical; have your DBA tune the query; choose the appropriate transaction levels.
- Consider storing all database character data in Unicode to eliminate conversion overheads. But beware: this step will cause your database size to grow, as Unicode requires 2 bytes per character.
- Use block fetches when the query will give a large ResultSet and all rows are needed. Use the Page-by-Page Iterator pattern when only some of the rows may be needed.
- Consider using an in-memory database (product) for data that doesn't need to be persisted.
- Use an algorithm to prune caches to stop them growing too large.
- Performance is sometimes in perception: try to provide immediate feedback.
- Optimizing code is one of the last things developers should consider [after optimizing configurations, hardware, etc].
http://www.javaworld.com/javaworld/jw-09-2001/jw-0907-merlin.html
Using nonblocking I/O and memory-mapped buffers in SDK 1.4. (Page last updated September 2001, Added 2001-10-22, Author Michael T. Nygard, Publisher JavaWorld). Tips:
- Before SDK 1.4, servers had a number of performance problems: i/o could easily be blocked; garbage was easily generated when reading i/o; many threads are needed to scale the server.
- Many threads each blocked on i/o is an inefficient architecture in comparison to one thread blocked on many i/o calls (multiplexed i/o).
- Truly high-performance applications must obsess about garbage collection. The more garbage generated, the lower the application throughput.
- A Buffer (java.nio.*Buffer) is a reusable portion of memory. A MappedByteBuffer can map a portion of a file directly into memory.
- Direct Buffer objects can be read/written directly from Channels, but nondirect Buffer objects have a data copy performed for read/writes to i/o (and so are slower and may generate garbage). Convert nondirect Buffers to direct Buffers if they will be used more than once.
- Scatter/gather operations allow i/o to operate to and from several Buffers in one operation, for increased efficiency. Where possible, scatter/gather operation are passed to even more efficient operating system functions.
- Channels can be configured to operate blocking or non-blocking i/o.
- Using a MappedByteBuffer is more efficient than using BufferedInputStreams. The operating system can page into memory more efficiently than BufferedInputStream can do a block read.
- Use Selectors to multiplex i/o and avoid having to block multiple threads waiting on i/o.
http://www.sys-con.com/java/article.cfm?id=1408
Combining apps in one JVM (Page last updated April 2002, Added 2002-04-26, Author Kirk Pepperdine, Publisher Java Developers Journal). Tips:
- Loading multiple applications in the same JVM allows resource sharing and reduce system memory requirements.
- Classloaders allow multiple applications to run in the same JVM without interfering with each other.
- [Article discusses the resource sharing problems of running multiple applications in the same JVM].
http://portals.devx.com/datadirect/Article/6338
JDBC Drivers (Page last updated March 2002, Added 2002-04-26, Author Barrie Sosinsky, Publisher DevX). Tips:
- Type 1 drivers are JDBC-ODBC bridges, plus an ODBC driver. Recommended only for prototyping, not for production. Not suitable for high-transaction environments. Not well supported, and limited in functionality.
- Type 2 drivers use a native API, and are part-Java drivers. Have a binary-code client loading overhead, and may not be fully-featured.
- Type 3 drivers are a pure Java driver which connects to database middleware. Can be server-based which is frequently faster than types 1 and 2.
- Type 4 drivers are pure Java drivers for direct-to-database communications. This can minimize overheads, and generally provides the fastest driver.
- JDBC 3.0 has additional features to improve performance such as advancements in connection pooling, statement pooling, RowSet objects.
- Opening a connection is the most resource-expensive step in database transactions. Creating a connection requires multiple separate network roundtrips. However, once the connection object has been created, there is little penalty in leaving the connection object in place and reusing it for future connections.
- Connection pooling, keeps open a cache of database connection objects, making them available for immediate use. Instead of performing expensive network roundtrips to the database server to open a connection, a connection attempt results in the re-assignment of a connection from the local cache.
- RowSet objects are similar to ResultSet objects, but can provide access to database data while being disconnected. This allows data to be efficiently cached in its simplest form.
- Prepared statement pooling (available from JDBC 3.0) caches SQL queries that have been previously optimized and run so that, should they be needed again, they do not have to go through optimization pre-processing again (avoiding optimization steps, such as checking syntax, validating addresses, and optimizing access paths and execution plans). Statement pooling can be a significant performance booster.
- Statement pooling and connection pooling in JDBC 3.0 can cooperate to share statement pools, so that connections that can use a cached statement from another connection, thus incurring statement preparation overheads only once on the first execution of some SQL by any connection.
- Database drivers developed by vendors other than the the database vendor can be better performing and more feature full. (Driver vendors concentrate on the driver, database vendors have many other things to consider).
- Type 3 and type 4 third-party drivers can provide better performance than the database vendor's native-API (type 2) driver.
- Try to use a driver that supports JDBC 3.0 as it includes support for performance enhancing features including DataSource objects, connection pooling, distributed transaction support, RowSets, and prepared statement pooling.
- Type 3 and Type 4 drivers are the drivers to use when performance is important.
http://developer.java.sun.com/developer/Books/EarlyJ2SE/IO.pdf
Shortened version of chapter 2, "I/O", from "Early Adopter J2SE 1.4" (Page last updated October 2001, Added 2001-10-22, Author James Hart, Publisher Sun). Tips:
- Non-blocking I/O can improve performance by minimizing the amount of time spent in I/O calls, though they may add complexity to the application.
- The old I/O classes can now be interrupted more reliably from 1.4.
- FileChannel.transferFrom() is an efficient way to copy data between files.
http://developer.java.sun.com/developer/Books/EarlyJ2SE/Using.pdf
Shortened version of chapter 5, "Utilities: The Logging Architecture", from "Early Adopter J2SE 1.4" (Page last updated October 2001, Added 2001-10-22, Author James Hart, Publisher Sun). Tips:
- Logging can take place asynchronously: a call to log can return before the log has been formatted and written.
- The logging framework provides methods (in Logger) for recording method