20 October 2012

Checking File Preambles and Watermarks

I'm constantly having to check the preambles of files for some kind of signature (or "watermark") to check or validate the file's format.

I've re-invented the wheel many times over while doing this. Time, I thought, for a little helper routine or two.

First, a little routine to check the next N bytes from a stream for a given sequence of bytes (the "watermark").

function StreamHasWatermark(const Stm: TStream;
  const Watermark: array of Byte): Boolean;
var
  StmPos: Int64;
  Buf: array of Byte;
  I: Integer;
begin
  Assert(Length(Watermark) > 0, 'No "watermark" specified');
  Result := False;
  StmPos := Stm.Position;
  try
    if Stm.Size - StmPos < Length(Watermark) then
      Exit;
    SetLength(Buf, Length(Watermark));
    Stm.ReadBuffer(Pointer(Buf)^, Length(Buf));
    for I := Low(Buf) to High(Buf) do
      if Buf[I] <> Watermark[I] then
        Exit;
    Result := True;
  finally
    Stm.Position := StmPos;
  end;
end;

Pass it a stream and an array containing the required "watermark" and the routine checks to see if the watermark exists at the current position† in the stream. The original stream position is restored after checking. This means that the routine can be called more than once to test for different watermarks without having worry about keeping track of the stream position.

† I designed the routine to check from the current stream position rather than the beginning of the stream because some of the files / streams my have have the required sequence of bytes offset from the start.

That's the core functionality taken care of, so how can we use the routine?

How about this generalised routine to check a file watermark or preamble?

function FileHasWatermark(const FileName: string;
  const Watermark: array of Byte; const Offset: Integer = 0): Boolean;
  overload;
var
  FS: TFileStream;
begin
  FS := TFileStream.Create(FileName, fmOpenRead or fmShareDenyNone);
  try
    FS.Position := Offset;
    Result := StreamHasWatermark(FS, Watermark);
  finally
    FS.Free;
  end;
end;

This routine is pretty self explanatory: it looks for the given sequence of bytes (Watermark) in the named file. It also has an optional parameter that lets you specify the offset of the watermark in the file.

Quite often watermarks are specified as ASCII text, so I've created an overload function to take an ASCII (actually ANSI) watermark instead of an array of bytes. Here it is:

function FileHasWatermark(const FileName: string;
  const Watermark: AnsiString; const Offset: Integer = 0): Boolean;
  overload;
var
  Bytes: array of Byte;
  I: Integer;
begin
  SetLength(Bytes, Length(Watermark));
  for I := 1 to Length(Watermark) do
    Bytes[I - 1] := Ord(Watermark[I]);
  Result := FileHasWatermark(FileName, Bytes, Offset);
end;

Finally a few examples, all of which assume the name of the required file is in a string variable named FileName:

  • A zip file created by PKZip has preamble 50 4B 03 $04 in hex. So a test for such a zip file could be:
    if FileHasWatermark(FileName, [$50, $4B, $03, $04]) then
      ShowMessage('PKZip file');
  • Some versions of the old style Windows help file have the byte sequence 00 00 FF FF FF FF at offset 6. The test is:
    if FileHasWatermark(FileName, [$00, $00, $FF, $FF, $FF, $FF], 6) then
      ShowMessage('WinHelp file');
  • I'm tinkering about with the Game of Life at the moment and have found there are two versions of the Life file format - 1.05 and 1.06 - which both use the .lif file extension but have different ASCII preambles. Here's a way to distinguish them using the ASCII overload of our function:
    if FileHasWatermark(FileName, '#Life 1.05') then
      ShowMessage('Life v1.05 file format')
    else if FileHasWatermark(FileName, '#Life 1.06') then
      ShowMessage('Life v1.06 file format')
    else
      ShowMessage('Invalid Life file format');

Hope that's useful to someone.

EDIT: Versions of these routines are now available from the Code Snippets Database.

No comments: