Enhancing search experience with Algolia
To provide your users with fast and responsive search, it is recommended to combine Contentful with a search provider like Algolia. Algolia offers fast and scalable APIs with easy-to-use search UI-widgets.
To implement a search widget like the one mentioned above, refer the “getting started” guide by Algolia.
Contentful and Algolia are very flexible products that can be customized to suit your needs. While this flexibility is great to build whatever you want, it also means that different content structures need different solutions to get the best result.
Let’s assume that you are running a typical blog and want to make the content of hundreds of great articles you wrote discoverable and searchable. Your content model includes a post
type that maps one-to-one to the particular article URL and resource online.
A content structure like above enables you to push needed data directly into Algolia using webhooks.
Create indices and records in Algolia
After you created an account at Algolia you can create indices. An index is a bucket of searchable data. Create a new index with a name you prefer (this tutorial goes with "posts") and Algolia presents you the options of manual record addition, a JSON or CSV file upload or usage of the API to import data.
![Algolia interface showing options to import data](algolia-index-options update)
You can use the API in combination with webhooks to index your content.
The Algolia webhook template
To connect Contentful directly with Algolia you can use the Algolia webhook template. The creation dialog asks for the credentials Algolia App ID
, Index name
and API key
which you’ll find in the Algolia web interface.
After entering these credentials the webhook setup will automatically create two webhooks:
- one to create and update certain records that were updated
- one to delete records that were unpublished
When you now publish an entry a webhook will be send to Algolia and index the updated entry resulting in an Algolia record. On inspecting the indexed data in Algolia, you would find out that a lot of Contentful metadata made its way into it.
![Algolia interface showing indexed Contentful metadata](Screen Shot algolia-index-webhook-update)
Contentful provides customizable payloads which let you define the response body of a sent webhook using JSON pointers.
![Custom Contentful webhook payload using JSON pointers](Screen Shot 2022-04-06 at 15.02.25)
JSON pointers are a defined string syntax for identifying a specific value within a JSON object. The syntax is fairly easy to use and using them you can clean up the contentful webhook payloads and make them a perfect fit to what Algolia expects.
{
"url": "/today-i-learned/{ /payload/fields/slug/en-US }/",
"content": "{ /payload/fields/body/en-US }",
"title": "{ /payload/fields/title/en-US }",
"objectID": "{ /payload/sys/id }"
}
Webhook transformations make it possible to index content in Algolia without additional setup or services required.
Cases in which indexing via webhooks is not enough
The direct connection of Algolia and Contentful works very well for the use case of one Contentful entry mapping to a URL but it’s recommended to define your own logic when you’re dealing with more complex scenarios.
Advanced data manipulation before it’s sent
When you’re dealing for example with large markdown fields it makes sense to not store the markdown version in Algolia because plain text can lead to better search results.
While you can rename payload fields, omit fields and shape the payload to your needs, advanced manipulations are currently not possible.
Inclusion of referenced item data in webhooks
Depending on your use case you might want to index an entry and all the data that it links because it it is a composition of different Contentful entities. Reference entry data is not available in the webhook payload and it’s recommended to write your own logic to feed Algolia then. The Contentful client libraries provide a way to retrieve linked data.
Logic defining a specific webhook payload
There may also be situations in which you only want to index a field or a set of data depending conditionally.
One way is to define separate webhooks depending on a specific environment, content type or entity id.
It is currently not possible to craft payload properties conditionally and in this case it’s recommended to write your own custom index script.
A serverless function can do all this
The go-to solution for advanced cases are serverless functions. You can use AWS Lambda or Google Cloud Functions to quickly spin up your own HTTP endpoint that is able to receive webhooks and transform the Contentful data to your needs to send it to Algolia.
Index and manipulate entries with a custom Node.js script
Another common approach to index data is to write a script to fetch particular entries, for example a certain content type, from Contentful and to index these in Algolia to make them searchable. A script can be run in your build pipeline or be executed in a serverless function triggered by a webhook.
Fetch article data from Contentful
To fetch all the articles from Contentful it’s recommended to use the JavaScript client library as it provides you with rate limit handling and link resolution out of the box. To access Contentful data, you have to provide a space id and a delivery access token to the createClient
function. You can find more information in the “Getting started with Contentful and JavaScript” guide.
const { createClient } = require('contentful');
const {
CTF_SPACE_ID: space, // '8gi...'
CTF_CDA_TOKEN: accessToken // 'c6c...'
} = process.env;
const ctfClient = createClient({
space,
accessToken
});
With the Contentful client library at hand you can start fetching data that is indexed later in Algolia.
Fetching entries using the entries collection endpoint (getEntries
)
The getEntries
method uses the entry collection HTTP endpoint (/entries
). It supports several search parameters. In case you want to fetch the data with a different programming language, the documentation includes examples for several other languages, too.
// ...
try {
// fetch entries of type post from Contentful
const { items } = await ctfClient.getEntries({
// only fetch entries of type post
content_type: 'post',
// increase limit to 1000
limit: 1000
})
} catch(err) {
console.error(err);
}
Be aware of the fact that the collection endpoint has a default item limit set to 100 responded entries. This limit can be increased up to 1000 included entries. To fetch more entries than that you have to paginate the entries yourself or you can check out the sync endpoint which is paginated in the JavaScript client library by default (the sync endpoint has some other characteristics to deal with though).
Define and manipulate searchable data
At this stage, you can transfer all the data to Algolia, but similar to the webhook example it is recommend to "clean it up" first.
For the described case only three fields of every article are relevant. The Algolia fulltext search should consider the articles' title and the actual content stored in body
, and be able to return the URL where the corresponding article page is available.
// ...
const removeMd = require('remove-markdown');
try {
// fetch entries from Contentful
const { items } = await ctfClient.getEntries({...})
// map and manipulate the data you’re interested in indexing in Algolia
const posts = items.map(post => ({
url: `/blog/${post.fields.slug}/`,
// remove markdown syntax for better search results
content: removeMd(post.fields.body),
title: post.fields.title,
// make entry id to objectID
objectID: post.sys.id
}));
} catch(err) {
console.error(err);
}
The body
of a post entry includes markdown syntax which can affect your search results. You can use remove-markdown to transform it to plaintext and also other data manipulations are possible because you have access to the whole npm ecosystem.
Algolia sets for every indexed record an objectID
. If you don’t provide an object id the service creates one for you. This has the downside that you have to keep track of the id Algolia sets in case of an update on your end.
It’s recommended to map the entry id (entry.sys.id
) to the objectID to make it easier to relate indexed records in Algolia to Contentful entries.
Index data in Algolia
To use Algolia's API in a Node.js script you could either go and read the API documentation for all the available endpoints or you use the packages and SDKs Algolia provides. The npm package algoliasearch provides a nice abstraction layer on top of the Algolia API including the addObjects
method which you can use to create new records or update existing ones.
// ...
const algoliasearch = require('algoliasearch');
const {
ALGOLIA_APP_ID, // 'VPI...'
ALGOLIA_ADMIN_KEY, // '3c3...'
ALGOLIA_INDEX, // 'posts'
...
} = process.env;
const algoliaClient = algoliasearch(
ALGOLIA_APP_ID,
ALGOLIA_ADMIN_KEY
);
const algoliaIndex = algoliaClient.initIndex(ALGOLIA_INDEX);
try {
// fetch entries from Contentful
const { items } = await ctfClient.getEntries({...})
// map and manipulate the data you’re interested in indexing in Algolia
const posts = items.map(post => ({...});
// index all the fetched entries and log result
const indexedContent = await algoliaIndex.addObjects(posts);
console.log('IndexedContent:', indexContent);
} catch(err) {
console.error(err);
}
After running the script you have the following data available in every record in Algolia: url
, plaintext content
, title
and objectID
. By default, if not configured otherwise, Algolia treats every attribute as searchable and it’s recommended to tweak these settings because information such as image paths or urls can be very useful to display search results later but don’t necessarily have to be considered when your users enter a search term.
The complete script – ready to use
This is the complete script to index entries of a the content type post
in Algolia.
#!/usr/bin/env node
(async () => {
const algoliasearch = require('algoliasearch');
const { createClient } = require('contentful');
const removeMd = require('remove-markdown');
const {
ALGOLIA_APP_ID,
ALGOLIA_ADMIN_KEY,
ALGOLIA_INDEX,
CTF_SPACE_ID: space,
CTF_CDA_TOKEN: accessToken
} = process.env;
const algoliaClient = algoliasearch(
ALGOLIA_APP_ID,
ALGOLIA_ADMIN_KEY
);
const algoliaIndex = algoliaClient.initIndex(ALGOLIA_INDEX);
const ctfClient = createClient({
space,
accessToken
});
try {
const { items } = await ctfClient.getEntries({
content_type: 'post',
limit: 1000
});
const posts = items.map(post => ({
url: `/blog/${post.fields.slug}/`,
content: removeMd(post.fields.body),
title: post.fields.title,
objectID: post.sys.id
}));
const indexedContent = await algoliaIndex.addObjects(posts);
console.log('Indexed Content:', indexedContent);
} catch (err) {
console.error(err);
}
})();
The Contentful and Algolia API are very fast which results in very low script execution times (indexing 150 entries takes roughly 200ms). The final script always fetches all the data and indexes it again in Algolia, though.
Frequent content updates in a short period of time may result in re-indexing the same records over and over again using the script above. To avoid this you could use Contentful’s sync API or the provide entry information sent by a webhook to handle delta updates.
Algolia and Contentful – a powerful combination
With Algolia and Contentful you have two API tools in your belt that provide you with content modeling flexibility to tailor your data to your product’s needs combined with a powerful fast search to engage with your users.