16 February 2014

Workaround for a bug in SourceForge's Blog RSS feed

I've just been setting up a new blog for CodeSnip on SourceForge (of which more later).

On setting up a FeedBurner feed for the blog I've come across a bug in the SourceForge news RSS feed that breaks FeedBurner.

The problem is that FeedBurner (correctly) treats the value of the SF feed's <guid> tags as valid URLs and uses them as destination URLs of some links. Unfortunately the <guid> tags in the SF feed are not valid URLs and this causes FeedBurner to generate bad links.

The RSS specification states that <guid> tags should signal whether or not they contain valid URLs by means of an isPermaLink attribute. The tag's value must be a valid URL if isPermaLink is true but must not to be treated as a URL if the value is false. The attribute's default value, when not present, is true. An here's the problem: SF doesn't provide an isPermaLink attribute and so FeedBurner assumes the value of the <guid> tag is a valid URL when in fact it's not.

This suggests a solution: we need to transform the SF feed, adding an isPermaLink with value false to every <guid> tag in the feed. This will cause FeedBurner to disregard the <guid> tag.

I've written a little PHP script to performs the required transformation. I can't modify the feed at source, so the next best thing is to read it in, modify it and make the modified feed available via a new URL.

Here's the script:

<?php
/*
  Fix for SF RSS feed bug: https://sourceforge.net/p/allura/tickets/6687/
  Reads RSS source code from SourceForge and re-renders it, adding an
  isPermaLink=false attribute to every <guid> tag.
*/

mb_internal_encoding('UTF-8');
mb_http_output('UTF-8');
mb_language('uni');
mb_regex_encoding('UTF-8');
ob_start('mb_output_handler');

class SFBlogFeedConverter {

  private $feedURL;   // URL of SourceForge RSS feed
  private $rssDOM;    // RSS DOM

  public function __construct($feedURL) {
    $this->rssDOM = new DOMDocument();
    $this->feedURL = $feedURL;
  }

  public function RenderRSS() {
    if ( !@$this->rssDOM->Load($this->feedURL)) {
      $code = 500;
      $desc = 'Internal Server Error';
      header("HTTP/1.0 $code $desc");
      header("Status: $code $desc");
      header('Content-Type: text/html; charset=utf-8');
      echo "<!DOCTYPE html>\n"
        . "<html>\n"
        . "<head>\n"
        . "<meta charset=\"utf-8\">\n"
        . "<title>$code $desc</title>\n"
        . "</head>\n"
        . "<body>\n"
        . "<h1>$code $desc</h1>\n"
        . "<p>Can''t open CodeSnip blog feed on SourceForge</p>\n"
        . "</body>\n"
        . "</html>\n";
      return;
    }

    $guidNodes = $this->rssDOM->getElementsByTagName('guid');
    foreach ($guidNodes as $guidNode) {
      $guidNode->setAttribute('isPermaLink', 'false');
    }
    header('Content-Type: application/xml; charset=utf-8');
    $xml = $this->rssDOM->saveXML();
    header('Content-Length: ' . strlen($xml));
    echo $xml;
  }
}

$blogXML = new SFBlogFeedConverter(
  // Replace [PROJECT] below with the required project name
  'http://sourceforge.net/p/[PROJECT]/blog/feed'
);
$blogXML->RenderRSS();
?>

Get the code from GitHub

First off we just make sure we use UTF-8 encoding.

The main code is wrapped up in the SFBlogFeedConverter class. There's a constructor that takes the URL of the feed we're converting as a parameter and stores it. It also creates a DOM object to use later.

The meat of the code is in the RenderRSS method. We try to load the RSS from the URL passed to the constructor. If that fails we create an error 500 response and exit. If the DOM loads OK we get all the <guid> nodes and give each one an isPermaLink attribute with the value False. Finally we write the required headers followed by the converted XML.

The last bit of code just creates a SFBlogFeedConverter instance with the required feed URL and then calls RenderRSS to perform the conversion.

To use the script, upload it to a web server and make it accessible via a URL. Then point FeedBurner (etc.) at the new URL and all should be fine.

I modified FeedBurner in this way for the CodeSnip blog feed and now everything works fine.

EDIT - I no longer have this blog on SF, so I can't show you the fix in action.

If you're experiencing the same problem please feel free to use the script until such time as SF fix the bug.

No comments: