20 October 2012

Checking File Preambles and Watermarks

I'm constantly having to check the preambles of files for some kind of signature (or "watermark") to check or validate the file's format.

I've re-invented the wheel many times over while doing this. Time, I thought, for a little helper routine or two.

First, a little routine to check the next N bytes from a stream for a given sequence of bytes (the "watermark").

function StreamHasWatermark(const Stm: TStream;
  const Watermark: array of Byte): Boolean;
  StmPos: Int64;
  Buf: array of Byte;
  I: Integer;
  Assert(Length(Watermark) > 0, 'No "watermark" specified');
  Result := False;
  StmPos := Stm.Position;
    if Stm.Size - StmPos < Length(Watermark) then
    SetLength(Buf, Length(Watermark));
    Stm.ReadBuffer(Pointer(Buf)^, Length(Buf));
    for I := Low(Buf) to High(Buf) do
      if Buf[I] <> Watermark[I] then
    Result := True;
    Stm.Position := StmPos;

Pass it a stream and an array containing the required "watermark" and the routine checks to see if the watermark exists at the current position† in the stream. The original stream position is restored after checking. This means that the routine can be called more than once to test for different watermarks without having worry about keeping track of the stream position.

† I designed the routine to check from the current stream position rather than the beginning of the stream because some of the files / streams my have have the required sequence of bytes offset from the start.

That's the core functionality taken care of, so how can we use the routine?

How about this generalised routine to check a file watermark or preamble?

function FileHasWatermark(const FileName: string;
  const Watermark: array of Byte; const Offset: Integer = 0): Boolean;
  FS: TFileStream;
  FS := TFileStream.Create(FileName, fmOpenRead or fmShareDenyNone);
    FS.Position := Offset;
    Result := StreamHasWatermark(FS, Watermark);

This routine is pretty self explanatory: it looks for the given sequence of bytes (Watermark) in the named file. It also has an optional parameter that lets you specify the offset of the watermark in the file.

Quite often watermarks are specified as ASCII text, so I've created an overload function to take an ASCII (actually ANSI) watermark instead of an array of bytes. Here it is:

function FileHasWatermark(const FileName: string;
  const Watermark: AnsiString; const Offset: Integer = 0): Boolean;
  Bytes: array of Byte;
  I: Integer;
  SetLength(Bytes, Length(Watermark));
  for I := 1 to Length(Watermark) do
    Bytes[I - 1] := Ord(Watermark[I]);
  Result := FileHasWatermark(FileName, Bytes, Offset);

Finally a few examples, all of which assume the name of the required file is in a string variable named FileName:

  • A zip file created by PKZip has preamble 50 4B 03 $04 in hex. So a test for such a zip file could be:
    if FileHasWatermark(FileName, [$50, $4B, $03, $04]) then
      ShowMessage('PKZip file');
  • Some versions of the old style Windows help file have the byte sequence 00 00 FF FF FF FF at offset 6. The test is:
    if FileHasWatermark(FileName, [$00, $00, $FF, $FF, $FF, $FF], 6) then
      ShowMessage('WinHelp file');
  • I'm tinkering about with the Game of Life at the moment and have found there are two versions of the Life file format - 1.05 and 1.06 - which both use the .lif file extension but have different ASCII preambles. Here's a way to distinguish them using the ASCII overload of our function:
    if FileHasWatermark(FileName, '#Life 1.05') then
      ShowMessage('Life v1.05 file format')
    else if FileHasWatermark(FileName, '#Life 1.06') then
      ShowMessage('Life v1.06 file format')
      ShowMessage('Invalid Life file format');

Hope that's useful to someone.

EDIT: Versions of these routines are now available from the Code Snippets Database.

No comments: