Lossless JPEG Metadata Removal Through Binary Manipulation

When taking digital photos the JPEG produced isn’t only a faithful representation of the scene, it usually also contains metadata like GPS coordinates, timestamps, camera settings, and device identifiers. Stripping this metadata carelessly can corrupt the file.

I learned this while building EXIF Scrubber. My first attempts used canvas-based re-encoding. Users complained about quality loss immediately. I needed a better approach. Here’s what I figured out over two weeks of debugging, reading specs, and fixing edge cases.

Re-encoding Destroys Quality

My first attempt loaded images into an HTML canvas, then exported them as new JPEGs. Clean, straightforward code. Completely wrong approach.

My scrubbed version looked softer, often darker, or having a different color profile.

JPEG is a lossy format. Each compression cycle degrades quality. Even at maximum quality settings, you lose information. It’s like photocopying a photocopy. Each generation gets worse.

I needed to strip metadata without decompressing the image data. That meant parsing JPEGs at the binary level.

How JPEGs Work

Every JPEG starts with a Start of Image (SOI) marker: 0xFF 0xD8. These two bytes identify the file as a JPEG.

What follows is a sequence of segments, each marked by a two-byte identifier starting with 0xFF. Here’s a typical structure:

0xFF 0xD8        SOI (Start of Image)
0xFF 0xE0        APP0 (JFIF header)
0xFF 0xE1        APP1 (EXIF data)
0xFF 0xDB        DQT (Quantization table)
0xFF 0xC0        SOF (Start of Frame)
0xFF 0xDA        SOS (Start of Scan)
[compressed image data...]
0xFF 0xD9        EOI (End of Image)

Except for SOI and EOI, every segment follows this pattern:

0xFF [marker] [length-high-byte] [length-low-byte] [segment data...]

The length includes itself but not the marker bytes. If a segment has a length of 0x00 0x10, you read 16 bytes total: 2 for length, 14 for data.

This self-describing structure makes parsing straightforward once you understand the pattern. I spent a week reading the JPEG spec before I felt comfortable writing code.

Where Metadata Hides

JPEG stores metadata in APP segments, numbered APP0 through APP15 (markers 0xFFE0 through 0xFFEF). Different applications claim different segments:

APP0 (0xE0) holds JFIF headers that define color space and pixel aspect ratio. Keep this one. I learned this after users reported color shifts in scrubbed images. Without APP0, some software misinterprets colors entirely.

APP1 (0xE1) contains EXIF data: camera settings, GPS coordinates, timestamps, device IDs. This is your main target. Remove APP1, and you eliminate most privacy concerns.

APP2 (0xE2) stores ICC profiles that define color management. These can reveal processing software, so I strip them for complete anonymity.

APP13 (0xED) holds IPTC data: captions, keywords, author information. News agencies use this format heavily.

APP14 (0xEE) contains Adobe-specific information: color transforms and version details that reveal which Adobe tools touched the file.

The remaining APP segments (APP3 through APP12, APP15) are rare but might hold proprietary data. EXIF Scrubber removes them all.

Building the Parser

I built the first version in JavaScript for the web, then ported the logic to F# for the cross-platform desktop app. The process follows four steps: validation, parsing, filtering, reconstruction.

Validate the JPEG

Check the first two bytes. If they’re not 0xFF 0xD8, bail out. Not a JPEG.

function scrubJpeg(arrayBuffer) {
    const view = new DataView(arrayBuffer);

    if (view.getUint8(0) !== 0xFF || view.getUint8(1) !== 0xD8) {
        throw new Error('Invalid JPEG: missing SOI marker');
    }

    const segments = [new Uint8Array([0xFF, 0xD8])];
    let offset = 2;

    // Continue parsing...
}

I added this validation after a user tried to scrub a PNG with a .jpg extension. The parser crashed halfway through with a cryptic error. Better to fail immediately with a clear message.

Parse Segments

Walk through the file using a cursor (offset). For each marker, read its length and decide whether to keep or discard it.

while (offset < arrayBuffer.byteLength) {
    if (view.getUint8(offset) !== 0xFF) {
        throw new Error('Expected marker at offset ' + offset);
    }

    const marker = view.getUint8(offset + 1);

    // Handle special markers without lengths
    if (marker === 0xD8 || marker === 0xD9 || 
        (marker >= 0xD0 && marker <= 0xD7)) {
        segments.push(new Uint8Array([0xFF, marker]));
        offset += 2;
        continue;
    }

    const length = view.getUint16(offset + 2);
    const segmentData = new Uint8Array(
        arrayBuffer.slice(offset, offset + 2 + length)
    );

    offset += 2 + length;

    // Decide which segments to keep...
}

The special case for SOI, EOI, and RST markers matters because these don’t have length fields. They’re always exactly two bytes. I discovered this when my parser kept throwing errors on progressive JPEGs with restart markers.

Filter Segments

Maintain a safe list of essential markers. Discard everything else. This approach is safer than block listing because unknown markers get removed by default.

const KEEP_MARKERS = new Set([
    0xD8, 0xD9,                          // SOI, EOI
    0xC0, 0xC1, 0xC2, 0xC3,             // SOF variants
    0xC5, 0xC6, 0xC7,
    0xC9, 0xCA, 0xCB,
    0xCD, 0xCE, 0xCF,
    0xDB,                                // DQT (Quantization Table)
    0xC4,                                // DHT (Huffman Table)
    0xDA,                                // SOS (Start of Scan)
    0xE0,                                // APP0 (JFIF)
]);

// In the parsing loop:
if (KEEP_MARKERS.has(marker)) {
    segments.push(segmentData);
}

Notice we keep APP0 but discard APP1 through APP15. I initially removed APP0 too, thinking all APP segments were metadata. Users reported bizarre color rendering. Some images looked overly saturated, others appeared washed out. Turns out APP0 contains JFIF headers that define basic image properties.

Handle Image Data

After the SOS (Start of Scan) marker, compressed image data runs until EOI (0xFF 0xD9). Copy all of it unchanged.

if (marker === 0xDA) {
    segments.push(segmentData);
    const imageDataStart = offset;

    while (offset < arrayBuffer.byteLength - 1) {
        if (view.getUint8(offset) === 0xFF && 
            view.getUint8(offset + 1) === 0xD9) {
            break;
        }
        offset++;
    }

    segments.push(new Uint8Array(
        arrayBuffer.slice(imageDataStart, offset)
    ));
}

This is the critical part. The compressed image data contains no metadata. It’s pure pixel information. EXIF Scrubber copies every byte exactly as it appears, and the image stays pixel-perfect.

Reconstruct the JPEG

Concatenate preserved segments into a single Uint8Array and create a Blob.

const totalLength = segments.reduce((sum, seg) => sum + seg.length, 0);
const output = new Uint8Array(totalLength);

let position = 0;
for (const segment of segments) {
    output.set(segment, position);
    position += segment.length;
}

return new Blob([output], { type: 'image/jpeg' });

The result is a valid JPEG with identical image data but no metadata. Users consistently report zero quality loss when they compare originals to scrubbed versions.

Edge Cases That Bit Me

Real-world JPEG files are messy. Camera manufacturers take liberties with the spec. Here are the edge cases that caused problems.

Marker collisions nearly broke my first implementation. Image data sometimes contains byte sequences that look like markers. The JPEG spec requires these to be escaped as 0xFF 0x00, but I didn’t handle this initially. My parser treated these as segment markers and truncated images prematurely. A user sent me a corrupted landscape photo, and I spent hours debugging before I found the issue.

Progressive JPEGs use multiple scans (multiple SOS markers) to display images at increasing quality levels. My early parser only copied the first scan. Users complained about corrupted output that looked blocky and incomplete. I fixed this by continuing to parse after the first SOS marker, capturing all scans.

Embedded thumbnails appear in APP1 (EXIF) more often than I expected. Some cameras store massive thumbnails, 100KB or more. Removing the entire APP1 segment eliminates these automatically, which is exactly what users want for privacy. File sizes drop noticeably.

Comment segments (0xFFFE) can contain text metadata. I remove them by default in EXIF Scrubber, but I’m considering adding an option to preserve user comments while stripping technical metadata. Some photographers add workflow notes in these segments.

Testing and Validation

I built a test suite using real photos from dozens of cameras and phones. iPhone 12, Canon 5D Mark IV, Nikon D850, Google Pixel 6, random Android phones. Here’s what I check:

File size should decrease (unless the input had no metadata). If file size increases, binary parsing failed somewhere. This catches most errors immediately.

Image integrity means the scrubbed file displays identically to the original. I use pixel-perfect comparison tools for this. EXIF Scrubber’s binary method passes every time. No differences, not even one pixel.

Metadata removal gets verified with exiftool -a -G1 image.jpg. The output should be empty except for basic JFIF information. EXIF Scrubber removes EXIF, IPTC, XMP, and ICC profiles completely.

Format validity requires parsing scrubbed JPEGs to ensure they follow the spec. Your parser should succeed without throwing errors. Invalid JPEGs won’t open in all software.

Why This Works

Zero quality loss because pixels never decompress. I can prove this with binary comparison tools. The image data in scrubbed files matches the original byte for byte.

Predictable results mean binary manipulation either succeeds completely or fails cleanly. No partial metadata removal, no subtle corruption. Users know exactly what they’re getting.

Smaller files result from removing metadata. EXIF data adds 50KB on average, sometimes 200KB on professional camera photos. Users notice the difference when uploading batches of images.

Speed comes from copying bytes instead of decoding and re-encoding. EXIF Scrubber processes thousands of images per minute on a typical laptop. Canvas-based approaches take 10x longer.

Privacy is the core benefit. Binary parsing runs entirely offline. EXIF Scrubber never uploads your images anywhere. The desktop app works without an internet connection. Your photos stay on your machine.

Try EXIF Scrubber

I built EXIF Scrubber to solve my own privacy concerns when sharing photos online. The desktop app runs on macOS, Windows, and Linux. It uses the binary parsing technique described here for JPEGs, plus similar approaches for PNG and WebP files.

Key features:

  • Drag and drop batch processing for hundreds of files at once
  • Zero quality loss through binary manipulation
  • Completely offline operation, no internet required
  • Support for JPEG, PNG, WebP, and HEIC formats
  • Preserves original file timestamps and directory structure

The app is free to download and use. If you’re sharing photos publicly or want control over what metadata leaves your device, give it a try.

I previously wrote about PNG chunk manipulation, which uses a chunk-based format rather than segments. That requires a different parsing strategy but follows similar principles. Check it out if you’re curious about how other image formats handle metadata.

S. Potter

S. Potter

Indie developer of EXIF Scrubber and recovering Site Reliability Engineer with penchant for functional programming.