Facebook Instant Articles - conquering new channels for our Contentful-powered blog
is a feature of Facebooks mobile apps that allows the almost instant loading of articles that are specially prepared by the publisher. It was originally launched in May 2015 with select partners, including The New York Times, BBC News, Spiegel Online and National Geographics.
Since April 12 Instant Articles are available to all publishers. Here at Contentful we believe that multi-platform publishing is the future, because what's important is the content itself, not its mode of presentation. In this spirit most (more about that later) of the posts in the very blog you're reading right now are now available as Instant Articles.
Estimates are hard
Like most software developers, my estimates of how long a piece of work takes are, to put it nicely, somewhat underperforming. When I proposed adding Instant Articles to our blog, after a cursory reading of the documentation, I made the - in retrospect, very optimistic - prediction that I'd need two days to get it up and running. As it turns out, that was overly cocky since I didn't carefully review the constraints imposed by Facebook nor did I account for my rather sparse Ruby skills.
Let's start at the beginning. Our blog had great prerequisites. It's a static site generated using Middleman, the content is managed through Contentful and structured as much as it can be expected from a blog, the main body is Markdown parsed with Kramdown. I expected some modifications to Kramdown's output would be necessary, but hopefully nothing major.
The beginning
The first steps are easy, get added to the Facebook Page for the site as, at least, an Analyst (or in case there's no Page, create one) and claim the URLs for production and development environments by adding some metadata to the page.
The content of Instant Articles is hosted by Facebook itself. There are two ways of delivering that content: the Publishing API or a special RSS feed. Since we love our static site and want to avoid complicating our build process, we decided early on to go with the RSS approach. Note that the RSS feed contains the markup for all articles as CDATA sections, not just links to other documents. There are some limitations to this, that one should be aware of:
The feed is ingested roughly every 10 minutes. Before ingestion visitors will be directed to your normal site.
If an article has been modified (according to the specified modification time) more than 24 hours before ingestion, it will be ignored.
No more than 100 articles, sorted by modification time, will be ingested.
For a developer blog like ours, these are all acceptable.
The first approach I took was to copy all articles within Middleman's sitemap and render them to HTML files using a template optimized for Instant Articles. Then I'd have the template for the RSS feed read these files and embed the content as CDATA section. Lastly, before uploading the website to S3, remove our magic Instant Article folder. Sounds complicated? It is. It's also completely unnecessary.
After struggling with my plan for a few hours I took a step back and looked how our RSS feed is generated. It's a single file using xmlbuilder. It simply iterates through the data provided by middleman-blog and builds the file. The article itself is HTML formatted so I went with HAML. Now I simply have two layouts, one for the RSS feed and one partial that renders the HTML. To get you started, we've put a copy of our XML feed template on GitHub.
Facebook allows setting up two RSS feeds, one for development and one for production. I set up our dev system as the development feed to load this with empty content elements, confirmed everything is working and that the basic ingestion works. With that out of the way we come to the interesting part.
What would you do without a body?
For the article itself, there is a certain structure Facebook expects articles to adhere to. Mapping them to our blog posts was fairly straightforward and no information had to be omitted. The biggest complexity here stems from the fact that the feature image and its caption are optional, everything else maps quite nicely. Again, what we've build can be found on GitHub.
Notice the render_fbia_markdown
function? To allow some more control of the rendered markup, we're not using HAML's :markdown filter, opting for a simple helper instead.
At this point my two days are almost up, just need some help from our designers to set up the styles, get a code review and do some testing to make sure everything still works. Gonna be close, but 2 days still seems doable.
Testing
Facebook recommends two things when it comes to testing Instant Articles. First of all, there's the article editor helpfully pointing out some mistakes. Second the Pages Manager application allows previewing articles on an iOS or Android device (except for tablets). At the time our blog contained exactly 95 articles, checking them all manually was exhausting but doable.
The first step was looking at the markup for the articles flagged by Facebook. Over a third of all articles were flagged. Darn. Most were minor things like a <br/>
in the source that Kramdown turned into <p><br/></p>
which Facebooks considers an empty element. Some articles that were flagged contained design elements that could not be recreated on Instant Articles, for these a simple blacklist was created. The two days are long over.
An error flagged by Facebook
A couple of years ago we changed our video host from YouTube to Wistia. Unfortunately, we had no consistent way of embedding these videos into the articles using the responsive embed, or, as I later discover the standard embed, will not work in Instant Articles. The only reliable way is to use the fallback embed with a fixed size. Facebook is smart enough to surround any iframe in an article with the required <figure class="op-embed">
so we did not have to change that. Still meant finding all articles with videos in them (grep
is the quickest way to go), going into the Contentful app, finding the video ID, going into Wistia, getting a new embed code, going back to Contentful and replacing the embed. Another half day gone.
We're also using using Kramdown's block level attributes feature. Originally I just filtered them all out before handing the Markdown to Kramdown. Since we're also using them for image captions - a feature supported by Instant Articles - we needed to retain that information.
Getting images to work correctly was a problem anyways. Having an img
element with a src
property within a p
element is not allowed. This and a number of other problems (no support for <code>
elements, no support for headings levels beyond two, no relative URLs) lead me to create a custom Kramdown converter to deal with all these problems. Luckily Kramdown is flexible enough to allow for this. I am, however, not exactly a seasoned Ruby developer, so it took me a bit more than a day to find all these issues and handle them correctly. The code is open source and on GitHub, so you don't have to do it yourself.
Why isn't this article an Instant Article
After fixing all the issues pointed out by the article editor next up was a visual inspection using the Facebook Pages app. In the back of my mind, I always knew that our big, beautiful code blocks that we have in so many articles would be a problem to get right but I had put this aside while being busy with other things. Turns out <pre>
elements are not rendered at all. The next idea was to simply embed them as interactive element. Turns out that isn't working well.
In the end, code blocks are simply not supported. Any article on our blog that has one is filtered out and will not be an Instant Article. The code decides whether an article has one is a small helper function:
Taking it live
Finally, all issues are solved. Time to submit this to Facebook's review. The review is supposed to take 3-5 days and can only be started after a certain number of articles (either 10 or 50 depending where you look) have been submitted through a RSS feed or the API. Manually created articles don't count. Also one can't submit for review with articles from the development feed, they have to be from the production feed.
That means it's time to take this live! I've had gotten my code reviews earlier so at 7 pm I'm finally able to merge my code into master. After waiting for the site to redeploy, I set up the production feed and test for one last time that everything works - including checking that all, now just 64, articles still look fine in the production environment. At roughly 8 pm I'm submitting for review and go home.
The review took either four or five days. One would expect that there's a notification when the review concludes. One would be wrong. After the review the existing articles are not automatically published, that has to be done manually. Fortunately, you can activate that all future articles should be published automatically.
Conclusion
Overall the process wasn't hard, just daunting. It took me approximately four and a half days to create everything and it took Facebook another five days to review making the overall process take two weeks. The basic setup was easy enough, aided in a large part by the structured content. The trouble is that within any corpus of content there are enough outliers in terms of either mistakes or uncommon elements requiring manual review. If I'd were to do it again, I'd first review what elements appear in the content and how they map to Instant Articles. The number of articles that can't be turned into an Instant Article will give an indication whether it's worthwhile to proceed. It should also give you a rough estimate of the complexity.
I hope Facebook's platform evolves to include more facilities for software related writing. Having syntax highlighted code blocks (or at least <pre>
elements) would be a huge boon.
By the way, my estimate of the time I need to write this post was also way off. Turns out I had much more to say than I expected.