Imperative vs Declarative Parsing

Decades ago I remember working at a fintech company where I spent hours debugging a corrupted parser ingesting market data that only showed up for a small subset of files. The bug? I’d forgotten to increment an offset variable for a specific scenario after reading a 4-byte length field. The parser read in data out of range of expected values and crashed in production for a specific variant of an uncommon type of financial security without exhaustive tests.

That debugging session left a bad taste in my mouth making me wonder if there was a better way to parse formats that changed over time without fiddling with offset variables.

That evening I lamented about why we can’t just describe the shape of the data and let a pre-canned library handle the mechanics by decomposing the problem into smaller, composable parts? It turns out we could with parser combinators but at that time I didn’t know what they were.

To demonstrate what I mean by this, I’ll show how to parse a PNG file header side-by-side using these different approaches.

The Imperative Approach in JavaScript

JavaScript developers often reach for DataView when reading binary formats. It allows us to create a view over a buffer, read values at specific byte offsets, and track our position manually in an efficient way.

Below is how we might parse a PNG header wrapping an Uint8Array in JavaScript:

// Assume `buffer` is a Uint8Array containing a PNG file
function readPngHeaderAndFirstChunk(buffer) {
  const view = new DataView(buffer.buffer, buffer.byteOffset, buffer.byteLength);
  
  // PNG files start with an 8-byte signature to identify the format
  const pngSignature = [137, 80, 78, 71, 13, 10, 26, 10];
  for (let i = 0; i < 8; i++) {
    if (view.getUint8(i) !== pngSignature[i]) {
      throw new Error("Invalid PNG signature");
    }
  }
  let offset = 8;  // track position manually

  // Read chunk header: length (4 bytes) + type (4 ASCII chars)
  if (offset + 8 > buffer.length) throw new Error("Unexpected EOF");
  const length = view.getUint32(offset);
  const type = String.fromCharCode(
    view.getUint8(offset + 4),
    view.getUint8(offset + 5),
    view.getUint8(offset + 6),
    view.getUint8(offset + 7)
  );
  offset += 8;  // don't forget this or everything breaks

  // Read the actual chunk data
  if (offset + length + 4 > buffer.length) throw new Error("Unexpected EOF");
  const data = buffer.slice(offset, offset + length);
  offset += length;

  // Read CRC checksum (4 bytes)
  const crc = view.getUint32(offset);
  offset += 4;

  return { type, length, data, crc, nextOffset: offset };
}

What makes this imperative?

We’re commanding the computer step by step. “Go to byte 8. Read 4 bytes. Move forward 4 bytes. Check if we’re past the end.”

Notice how offset appears everywhere. Managing the offset variable is the application developer’s responsibility. If we increment it wrong once or forget, our parser reads from the wrong location. If we forget a bounds check, our parser crashes on a buffer overrun.

I’ve written many such parsers and they worked, but they’re fragile so when I needed to accomodate a new field, I had to be very careful not to break existing functionality. Automated tests help but the changes were cumbersome. This can still work well for parsers where the format will not change over time or very infrequently.

The Declarative Approach with Parser Combinators

Parser combinators flip the script. Instead of telling the computer how to read bytes, we describe what the data format looks like. Small parsing functions combine into larger ones. The library handles position tracking and bounds checking.

Here’s the same PNG parser in PureScript:

-- Helper functions read typed data and advance position
-- * `anyUInt32BE` reads 4 bytes as big-endian Int
-- * `anyUInt8` reads one byte
-- * position advances automatically after each read

parsePngHeaderAndFirstChunk :: Parser
  { chunkType :: String
  , length :: Int
  , data :: Array UInt8
  , crc :: UInt32 }
parsePngHeaderAndFirstChunk = do
  -- Verify the 8-byte PNG signature
  signature <- count 8 anyUInt8
  unless (signature == [137,80,78,71,13,10,26,10]) $
    fail "Invalid PNG signature"

  -- Read chunk metadata
  length <- anyUInt32BE
  c1 <- anyUInt8
  c2 <- anyUInt8
  c3 <- anyUInt8
  c4 <- anyUInt8
  let chunkType = fromCharArray [ char c1, char c2, char c3, char c4 ]

  -- Read chunk payload
  chunkData <- count length anyUInt8

  -- Read CRC
  crc <- anyUInt32BE

  return { chunkType, length, data: chunkData, crc }

What makes this declarative?

We’re describing structure, not issuing commands. “The data contains a signature, then a length, then a type.” The parser figures out the mechanics.

Much of the bookkeeping is handled by the library with low level details like bounds checking and position tracking no longer the application developer’s responsibility.

This syntax is foreign for many JavaScript developers. But after converting a few parsers from imperative to declarative style, I stopped going back for data formats that are likely to change over time. The reduction in cognitive load is significant and makes a difference for maintenance.

What We Actually Get From Each Approach

AspectImperative JSParser Combinators
Position trackingManual (offset += 4)Handled by library
Bounds checkingManual (if offset + n > length)Built into each combinator
Error handlingThrow exceptions everywherePropagates through parser context
Mental model“Move cursor, read byte, repeat”“Data has this shape”
Debugging difficultyHigher (off-by-one and bounds errors lurk)Lower (structure mismatches are obvious)
Learning curveMinimal for JS developersSteep if we’re new to FP
PerformanceFast (direct memory access but dependent on the skill of the application developer)Competitive (for well-optimized libraries so check this)

Why This Actually Matters

Parsing bugs can cause silent data corruption and security vulnerabilities when malformed input isn’t rejected properly.

Imperative parsing puts the burden on application developers. Every offset calculation. Every bounds check. Every error path. Miss one detail, and everything breaks.

Declarative parsing moves that burden to the library. We still need to understand our data format, but we’re freed from most of the bookkeeping. Our application code reads like a specification. When the parser fails, it will tell us which part of the structure didn’t match largely for free.

I spent three hours debugging that offset bug. The parser combinator version would’ve failed immediately with “expected 4 bytes, got 2” at the exact location. That’s the difference.

The imperative approach isn’t wrong. The imperative monolithic approach is often more efficient and written in a familiar coding style for most JavaScript developers. However, if you’ve ever lost hours to an off-by-one error in a parser, parser combinators deserve a look. They won’t eliminate all bugs, but they eliminate one category of them especially if the format you are parsing keeps changing.

S. Potter

S. Potter

Indie developer of EXIF Scrubber and recovering Site Reliability Engineer with penchant for functional programming.