20 October 2012

Checking File Preambles and Watermarks

I'm constantly having to check the preambles of files for some kind of signature (or "watermark") to check or validate the file's format.

I've re-invented the wheel many times over while doing this. Time, I thought, for a little helper routine or two.

First, a little routine to check the next N bytes from a stream for a given sequence of bytes (the "watermark").

function StreamHasWatermark(const Stm: TStream;
  const Watermark: array of Byte): Boolean;
var
  StmPos: Int64;
  Buf: array of Byte;
  I: Integer;
begin
  Assert(Length(Watermark) > 0, 'No "watermark" specified');
  Result := False;
  StmPos := Stm.Position;
  try
    if Stm.Size - StmPos < Length(Watermark) then
      Exit;
    SetLength(Buf, Length(Watermark));
    Stm.ReadBuffer(Pointer(Buf)^, Length(Buf));
    for I := Low(Buf) to High(Buf) do
      if Buf[I] <> Watermark[I] then
        Exit;
    Result := True;
  finally
    Stm.Position := StmPos;
  end;
end;

Pass it a stream and an array containing the required "watermark" and the routine checks to see if the watermark exists at the current position† in the stream. The original stream position is restored after checking. This means that the routine can be called more than once to test for different watermarks without having worry about keeping track of the stream position.

† I designed the routine to check from the current stream position rather than the beginning of the stream because some of the files / streams my have have the required sequence of bytes offset from the start.

That's the core functionality taken care of, so how can we use the routine?

How about this generalised routine to check a file watermark or preamble?

function FileHasWatermark(const FileName: string;
  const Watermark: array of Byte; const Offset: Integer = 0): Boolean;
  overload;
var
  FS: TFileStream;
begin
  FS := TFileStream.Create(FileName, fmOpenRead or fmShareDenyNone);
  try
    FS.Position := Offset;
    Result := StreamHasWatermark(FS, Watermark);
  finally
    FS.Free;
  end;
end;

This routine is pretty self explanatory: it looks for the given sequence of bytes (Watermark) in the named file. It also has an optional parameter that lets you specify the offset of the watermark in the file.

Quite often watermarks are specified as ASCII text, so I've created an overload function to take an ASCII (actually ANSI) watermark instead of an array of bytes. Here it is:

function FileHasWatermark(const FileName: string;
  const Watermark: AnsiString; const Offset: Integer = 0): Boolean;
  overload;
var
  Bytes: array of Byte;
  I: Integer;
begin
  SetLength(Bytes, Length(Watermark));
  for I := 1 to Length(Watermark) do
    Bytes[I - 1] := Ord(Watermark[I]);
  Result := FileHasWatermark(FileName, Bytes, Offset);
end;

Finally a few examples, all of which assume the name of the required file is in a string variable named FileName:

  • A zip file created by PKZip has preamble 50 4B 03 $04 in hex. So a test for such a zip file could be:
    if FileHasWatermark(FileName, [$50, $4B, $03, $04]) then
      ShowMessage('PKZip file');
  • Some versions of the old style Windows help file have the byte sequence 00 00 FF FF FF FF at offset 6. The test is:
    if FileHasWatermark(FileName, [$00, $00, $FF, $FF, $FF, $FF], 6) then
      ShowMessage('WinHelp file');
  • I'm tinkering about with the Game of Life at the moment and have found there are two versions of the Life file format - 1.05 and 1.06 - which both use the .lif file extension but have different ASCII preambles. Here's a way to distinguish them using the ASCII overload of our function:
    if FileHasWatermark(FileName, '#Life 1.05') then
      ShowMessage('Life v1.05 file format')
    else if FileHasWatermark(FileName, '#Life 1.06') then
      ShowMessage('Life v1.06 file format')
    else
      ShowMessage('Invalid Life file format');

Hope that's useful to someone.

EDIT: Versions of these routines are now available from the Code Snippets Database.

New look blog

Just finished giving the blog a new look and feel. I hope it's clearer and less fussy than the previous version.

The syntax highlighter styling has been switched to emulate the Delphi "Classic" theme, e.g.:

procedure Foo;
  // example of highlighter
  ShowMessage('The answer is ' + IntToStr(42));
end;

So there we are then. Comments appreciated.

13 October 2012

CodeSnip 4 Emerges

After a lot of development CodeSnip v4 has finally been released. You can get it from the CodeSnip Download Page or from SourceForge.

Here's a list of the key new features and changes:

Snippet Handling

  • New "unit" and "class" snippet kinds that can include complete units and classes (and advanced records) in the database. Both can be test compiled and classes / advanced records can also be included in generated units.
  • Snippets from both the main and user databases can now be duplicated. This is very useful if you have created a snippet and want to create another one that shares a lot of the source code, dependencies etc.
  • The full range of Unicode characters can now be used for snippet names, descriptions etc., and for source code.
  • There is now finer control over the control of warnings in generated code via the $WARN directive.
  • The names of referenced units may now contain dots so that Delphi namespaces can be used.

User Interface

  • The new multi-tab display can display details of more than one snippet, category etc. in the main display.
  • The main menu has been re-organised and there are new menu and tool-bar glyphs.
  • The structure of snippet pages in the details pane is now customisable: various page elements can be omitted and the order of elements can be changed. Each snippet type has its own page customisation.
  • The number of compilers that appear in the compile results table in the details pane can now be limited.
  • The colours used for snippet names and headings can now be customised.
  • Snippets can now have an optional "display name" that, unlike the snippet's name, does not need to be unique. This is useful for giving meaningful names to snippets such as overloaded functions. For example snippets ResizeRect_A and ResizeRect_B now have display names ResizeRect (TSize overload) and ResizeRect (Longint overload) which are much more meaningful.
  • Snippet descriptions can now be formatted and can contain multiple paragraphs, just like "extra" text. Both are edited using the new Markup Editor.
  • Syntax highlighting of source code of user defined snippets can now be switched off on a per-snippet basis. This mainly of use for "freeform" snippets that may be used for notes or snippets in languages other than Pascal.
  • Information about how a snippet from the online Code Snippets Database was tested is now displayed by means of a glyph at the top right of the detail pane.
  • The Welcome page has been completely redesigned to be cleaner and to provide more useful information.

Test Compilation

  • You can specify the paths to be searched by compilers when looking for used units. This lets you compile snippets that use units other than those provided in the Delphi VCL and RTL. For example you could specify the path to the Indy components if you need to reference them from your snippets.
  • Results of test compilations now appear in a dialogue box instead of in a tab in the detail pane.

Other Features

  • There is a new option on the Tools menu that checks availability of new versions of CodeSnip.
  • Text and compiler searches can now optionally refine a previous search rather than always searching the entire database.
  • The contents of a category can now be printed.

For other features of v4 please read the change log for release 4.0.0 and all preceding pre-releases, including alpha (preview) and beta releases and release candidates.