Compression Formats for Network Deployment

Java Rich Internet Applications Guide > Networking > Compression Formats for Network Deployment

To increase server and network availability and bandwidth, two compression formats are available to Java deployment of applications and applets: gzip and Pack200. With both techniques, the compressed JAR files are transmitted over the network and the receiving application uncompresses and restores them.

See Reducing the Download Time in the Java Tutorials to create and deploy a compressed JAR file for a rich Internet application.

This section describes the technical details of how a web server handles a compressed JAR file. The following topics are covered:

Background

Hypertext Transfer Protocol -- HTTP 1.1 (RFC 2616) discusses HTTP compression. HTTP Compression allows applications JAR files to be deployed as compressed JAR files. The supported compression techniques are gzip, compress, and deflate.

As of SDK/JRE version 5.0, HTTP compression is implemented in Java Web Start and Java Plug-in in compliance with RFC 2616. The supported techniques are gzip and pack200-gzip.

The requesting application can send an HTTP request to the server indicating its ability to handle compressed versions of the file. The following is an example HTTP request created when the Dynamic Tree Demo applet, whose JAR file has been compressed with Pack200, is loaded:

GET http://www.example.com/DynamicTreeDemo.jar.pack.gz HTTP/1.1
accept-encoding: pack200-gzip,gzip
User-Agent: Mozilla/4.0 (Windows 7 6.1) Java/1.7.0
Host: example.com
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive

The following is the HTTP response from the server:

HTTP/1.1 200 OK
Date: Wed, 21 Mar 2012 20:13:22 GMT
Server: Apache/2.2.11 (Solaris, Linux, or Mac OS X) mod_ssl/2.2.11 OpenSSL/0.9.8k SVN/1.6.2 DAV/2
Last-Modified: Thu, 08 Mar 2012 03:48:34 GMT
ETag: "489ee5-112d-4bab326774e43"
Accept-Ranges: bytes
Content-Length: 4397
Keep-Alive: timeout=5, max=99
Connection: Keep-Alive
Content-Type: application/x-gzip
Content-Encoding: gzip

For more information about the Dynamic Tree Demo applet, see Deploying an Applet in the Java Tutorials.

The Accept-Encoding field specifies what the client can accept, which is set by the client. The Content-Encoding field indicates what is being sent, which is set by the server. The Content-Type field indicates what the client should expect when the transformation or decoding is done.

In this example, the Accept-Encoding field is set to pack200-gzip and gzip, indicating to the server that the application (in this case, Mozilla Firefox running in Windows 7 with the Java Plug-in that comes with JRE 7) can handle pack200-gzip and gzip formats.

The server searches for the requested JAR file with a .pack.gz or .gz file extension and responds with the located file. The server sets the response header Content-Encoding field to pack200-gzip, gzip, or NULL depending on the type of file that is being sent, and optionally may set the Content-Type to application/x-java-archive. Therefore, by inspecting the Content-Encoding field, the requesting application can apply the corresponding transformation to restore the original JAR file.

Example 1: Application requesting packed or compressed JAR

Example 1: Application requesting packed or compressed JAR

In Example 1, the client requests the file foo.jar with the Accept-Encoding field pack200-gzip,gzip. The server searches for the file foo.jar.pack.gz. If the server finds the file, it will send the file to the client and set the Content-Encoding field to pack200-gzip.

Example 2: Application requesting packed or compressed JAR

Example 2: Application requesting packed or compressed JAR

In Example 2, if the file foo.jar.pack.gz is not found, the server responds with the file foo.jar.gz, if it is found, and sets the Content-Encoding field to gzip.

Example 3: Application requesting packed or compressed JAR

Example 3: Application requesting packed or compressed JAR

In Example 3, if the files foo.jar.pack.gz and foo.jar.gz are not found, then the server responds with the file foo.jar and either does not set the Content-Encoding field or sets it to NULL.

Example 4: Legacy application requesting JAR

Example 4: Legacy application requesting JAR

In Example 4, a legacy application (an application without HTTP or Pack200 compressions) requests the file foo.jar; consequently this application will continue to work seamlessly. Therefore, it is recommended that you host all three files foo.jar, foo.jar.gz, and foo.jar.jar.gz.

GZIP Compression

gzip is a freely available compressor available within the JRE and the SDK as java.util.zip.GZIPInputStream and java.util.zip.GZIPOutputStream.

The command line versions are available with most Solaris, Linux, or Mac OS X operating systems, Windows UNIX toolkits (Cygwin and MKS Toolkit), or from http://www.gzip.org/.

You can get the highest degree of compression using gzip to compress an uncompressed JAR file versus compressing a compressed JAR file. The downside is that the JAR file may be stored uncompressed on target systems.

Here is an example:

As you can see, the download size can be reduced by 14% by compressing an uncompressed JAR file compared to 3% by compressing a compressed JAR file.

Pack200 Compression

Pack200 compresses large files very efficiently, depending on the density and size of the class files in the JAR file. You can expect compression to one-ninth the size of the JAR file if it contains only class files and is in the order of several megabytes.

Using the same JAR file in the previous example:

In this case, the same JAR file can be reduced by 50%.

Pack200 works most efficiently on Java class files. It uses several techniques to efficiently reduce the size of JAR files:

Compress and uncompress JAR files with the command line interfaces pack200 and unpack200 in the bin directory of your SDK or JRE directory.

You can also programmatically invoke Pack200 interfaces; see java.util.jar.Pack200.

Steps to Pack a file

1. Consider the size of the JAR file, the contents of the JAR file, and the bandwidth of your target audience.

All these factors play into choosing a compression technique. The unpack200 tool is designed to be as efficient as possible, and it takes little time to restore the original file. If you have large JAR files (2 MB or more) comprised mostly of class files, Pack200 is the preferred compression technique. If you have large JAR files which are comprised of resource files (JPEG, GIF, data, etc.), then gzip is the preferred compression technique.

2. Specify the segment limit for Pack200 compression.

Pack200 loads the entire compressed file into memory. However, when target systems are memory and resource constrained, setting the Pack200.Packer.SEGMENT_LIMIT to a lower value will reduce the memory requirements during compression and uncompression.

As a special case, a value of -1 will produce a single large segment with all input files, while a value of 0 will produce one segment for each class. Larger archive segments result in less fragmentation and better compression, but processing them requires more memory.

The default is -1, which means pack200 will always create a single segment output file. In cases where extremely large output files are generated, you are are strongly encouraged to use segmenting or break up the input file into smaller JARs.

For example, a 10 MB JAR packed without this limit will typically pack about 10% smaller, but pack200 may require a larger Java heap (about ten times the segment limit).

3. Sign the JAR files.

Pack200 rearranges the contents of the resulting JAR file. The jarsigner tool hashes the contents of the class file and stores the hash in an encrypted digest in the manifest. When unpack200 uncompresses a file, the contents of the classes will be rearranged and thus invalidate the signature. Therefore, the JAR file must be normalized first using pack200 and unpack200, and thereafter signed.

Here's why this works: Any reordering pack200 does on any class file structures is idempotent, so the second time it is compressed, it does not change the orderings produced by the first compression. Also, unpack200 is guaranteed by the JSR 200 specification to produce a specific bytewise image for any given transmission ordering of archive elements.

For example, suppose you want to use HelloWorld.jar:

  1. Recompress, or repack, the file to normalize the JAR file; repacking uncompresses and compresses the JAR file in one step.

    % pack200 --repack HelloWorld.jar
  2. Sign the JAR.

    % jarsigner -keystore myKeystore HelloWorld.jar user_name

    Note: You must sign the repacked file with the same key that was used when building the original JAR file. Alternatively, delete all signature files found in the META-INF directory before repacking, re-signing and verifying. The signature files are named MANIFEST.MF, *.DSA and *.SF.

    Verify the just signed JAR file to ensure the signing worked.

    % jarsigner -verify HelloWorld.jar
    jar verified.

    Ensure the JAR file still works.

    % Java -jar HelloWorld.jar
    HelloWorld
  3. Compress the JAR file with pack200.

    % pack200 HelloWorld.jar.pack.gz HelloWorld.jar

    Note: You must compress the JAR file with the same options that you used to repack the file to normalize the JAR file, as demonstrated in step 1. Additionally, you must set the segment limit to -1 (unlimited) for all packing steps when using JDK 6 and earlier releases to prevent accidental variations of segment boundaries; class file sizes can change slightly under these circumstances, thus disrupting signatures. The default segment limit for JDK 7 and later is -1.

  4. Uncompress the file with unpack200

    % unpack200 HelloWorld.jar.pack.gz HelloT1.jar
  5. Verify the JAR file.

    % jarsigner -verify HelloT1.jar
    jar verified.

    Test the JAR file.

    % Java -jar HelloT1.jar
    HelloWorld

After verification, you can deploy the compressed pack file HelloWorld.jar.pack.gz..

4. Apply reduction techniques

Pack200 by default behaves in a High Fidelity (Hi-Fi) mode, meaning all the original attributes present in the classes as well as the attributes of each individual entry in a JAR file is retained. These typically tend to add to the packed file size; here are some of the techniques one can use to further reduce the size of the download:

  1. Modification times: If modification time of the individual entries in a JAR file is not a concern, you can specify the option Pack200.Packer.MODIFICATION_TIME="LATEST". This will allow one modification time to be transmitted in the pack file for each segment. The latest time will be the latest time of any entry within that segment.

  2. Deflation hint: Similar to setting the modification time to "LATEST", if the compression state of the individual entries in the archive is not required, set Pack200.Packer.DEFLATION_HINT="false". This will fractionally reduce the download size, as individual compression hints will not be transmitted. However, the JAR file when recomposed will contain "stored" entries and hence may consume more disk space on the target system.

    For example:

    pack200 --modification-time=latest --deflate-hint="true"
    tools-md.jar.pack.gz tools.jar

    Note: the above optimizations will yield better results with a JAR file containing thousands of entries.

  3. Attributes: Several class attributes are not required when deploying JAR files. These attributes can be stripped out of class files, significantly reducing download size. However, care must be taken to ensure that required runtime attributes are maintained.

    1. Debugging attributes: If debugging information, such as Line Numbers and Source File, is not required (typically in applications stack traces), then these attributes can be discarded by specifying Pack200.Packer.STRIP_DEBUG=true.This typically reduces the packed file by about 10%.

      Example:

      pack200 --strip-debug tools-stripped.jar.pack.gz tools.jar
    2. Other attributes: Advanced users may use some of the other strip-related properties to strip out additional attributes. However, extreme caution should be used when doing so, the resultant JAR file must be tested on all possible Java runtime systems to ensure that the runtime does not depend on the stripped attributes.

5. Handle unknown attributes:

Pack200 deals with standard attributes defined by the Java Virtual Machine Specification; however compilers are free to introduce custom attributes. When such attributes are present, by default, Pack200 passes through the class, emitting a warning message. These "passed-through" class files may contribute to bloating of packed files. If the unknown attributes are prevalent in the classes of a JAR file, this may lead to a very large bloat in the compressed output. In such cases, consider the following strategies:

Note: When signing large JAR files, this step may fail with a security error. A likely cause is bug 5078608. Use one of the following workarounds:

6. Take advantage of Pack200 in your installation program

You may wish to take advantage of the Pack200 technology in your installation program, whereby a product's JAR files may need to compressed using Pack200 and uncompressed during installation. If the JRE or SDK is bundled in the installation, you are free to use the unpack200 (Solaris, Linux, or Mac OS X) or unpack200.exe (Windows) tool in the distribution bin directory. This implementation is a pure C++ application requiring no Java runtime to be present for it to run.

Windows: Installers may use a better algorithm than the one in GZIP to compress entries. In such cases, one will get better compression using the Installer's intrinsic compression, by using the pack200 tool as follows:

pack200 --no-gzip foo.jar.pack foo.jar

This will prevent the output file from being gzip compressed.

unpack200 is a Windows console application; i.e. it will display a MS-DOS window during the install. To suppress this, use a launcher with a WinMain, which will suppress this window, as shown below.

Sample Code:

#include "windows.h"
#include <stdio.h>

int APIENTRY WinMain(HINSTANCE hInstance,
                     HINSTANCE hPrevInstance,
                     LPSTR     lpCmdLine,
                     int       nCmdShow) {
  STARTUPINFO si;
  memset(&si, 0, sizeof(si));
  si.cb = sizeof(si);

  PROCESS_INFORMATION pi;
  memset(&pi, 0, sizeof(pi));

  //Test
  //lpCmdLine = "c:/build/windows-i586/bin/unpack200 -l c:/Temp/log c:/Temp/rt.pack c:/Temp/rt.jar";
  int ret = CreateProcess(NULL,                 /* Exec. name */
                          lpCmdLine,            /* cmd line */
                          NULL,                 /* proc. sec. attr. */
                          NULL,                 /* thread sec. attr */
                          TRUE,                 /* inherit file handle */
                          CREATE_NO_WINDOW | DETACHED_PROCESS, /* detach the process/suppress console */
                          NULL,                 /* env block */
                          NULL,                 /* inherit cwd */
                          &si,                      /* startup info */
                          &pi);                     /* process info */
  if ( ret == 0) ExitProcess(255);

  // Wait until child process exits.
  WaitForSingleObject( pi.hProcess, INFINITE );

  DWORD exit_val;

  // Be conservative and return
  if (GetExitCodeProcess(pi.hProcess, &exit_val) == 0) ExitProcess(255);

  ExitProcess(exit_val); // Return the error code of the child process

  return -1;
}

Testing

It is required that all JAR files, compressed and uncompressed, be tested for correctness with your applications test qualifiers. When using the command line interface pack200, the output file will be compressed using gzip with default values. A user may create a simple pack file and compress using gzip with user-specified options or using some other compressor.

More Information

For more information, see pack200 and unpack200 in Java Deployment Tools.

Updates in Java Standard Edition 6

In Java SE 6, the Java class file format has been updated. For more information see JSR 202: Java Class File Specification Update. Due to JSR 202, the Pack200 engine needs to be updated accordingly for the following reasons:

To keep the changes minimal and seamless for users, pack200 will generate appropriately versioned pack files based on the version of the input class files.

Also to maintain backward compatibility, if the input JAR files are solely comprised of JDK 5 or older class files, a JDK 5 compatible pack file is produced. Otherwise a Java SE 6 compatible Pack200 file is produced. For more information, refer to the pack200 man page for Solaris, Linux, Mac OS X, or Windows.


Copyright © 1993, 2014, Oracle and/or its affiliates. All rights reserved.