Optimize PNG size

Binary Adventures

We were faced with a web application that generates its content from data stored in a Microsoft SQL Server database, including image files (PNG) which were stored in a VARBINARY(MAX) column.

Because of the number of files and the storage used by the table, all options to reduce the size of the binaries (without quality loss) were welcome. One approach was to use lossless compression, thereby reducing the file size without sacrificing quality.

Approach

PNG file size can be reduced through:

  • Bit depth reduction
  • Color type reduction
  • Color palette reduction
  • Alpha channel reduction

More information regarding these techniques is available at http://optipng.sourceforge.net/pngtech/optipng.html. What’s important that this is a lossless optimization. Thus, the original and optimized version are visually indistinguishable.

Several optimizers are available, but we had specific requirements in order to fit the optimization into the existing process:

  • Batch processing – most optimizers have a GUI; we need a CLI tool as the compression needs to be fit into an existing process.
  • Multi-processing – due to the number of PNG files that need to be optimized in each run, we need to parallelize the process as much as possible.
  • Free – there was no budget available to buy a commercial optimizer.

A lot of optimizers also suffer from being rather immature or lacking documentation, making them unsuitable for use in a production environment.

Finally, we settled on OptiPNG. The only requirement it failed, was that it doesn’t offer any multi-processing features. The workaround was to drive the OptiPNG process from a Python application, using the concurrent.futures module to implement multi-processing.

An excerpt from the code:

def main(folder, workers=4):
    exe_crush = Path(resource_path('bin'), 'optipng.exe')
    cmd_crush = '"' + str(exe_crush) + '" -clobber -quiet "{}"'
    log.debug('Command to run: {}'.format(cmd_crush))

    # Get all PNG files in the folder and subfolders
    png_files = list(Path(folder).glob("**\*.png"))
    size_before = calc_total_size(png_files) // 1048576

    log.info('Crushing {} PNG files...'.format(len(png_files)))

    with concurrent.futures.ThreadPoolExecutor(max_workers=workers) as exec:
        start_time = timeit.default_timer()
        fs = [exec.submit(convert, cmd_crush, png) for png in png_files]
        log.info('Waiting for process to complete...')
        concurrent.futures.wait(fs)
        end_time = timeit.default_timer()
        log.info('Done!')

    size_after = calc_total_size(png_files) // 1048576

    log.info('Crushed: {} MB >> {} MB'.format(size_before, size_after))
    log.info('Saved: {} MB'.format(size_before - size_after))
    log.info('Time elapsed: {} seconds'.format(end_time - start_time))

The initial test results look promising:

Test results