General tips and best practices
  • 21 Dec 2023
  • 5 Minutes to read
  • Contributors
  • Dark
    Light
  • PDF

General tips and best practices

  • Dark
    Light
  • PDF

Article Summary

What can go wrong?

Let's first understand how Prerender.io works to pinpoint issues in our setup.

  1. A bot tries to fetch your site (e.g., Googlebot).
  2. The Prerender.io middleware intercepts the request (based on the user agent) and invokes the Prerender cloud service to fetch your site's content.
  3. The Prerender.io service fetches your site using the Prerender.io user agent, but it only renders, caches and stores the DOM, it doesn't cache images or any other static assets.
  4. The Prerender service renders all JavaScript content on your page and returns the fetched and prerendered DOM content to the middleware, which responds to the originally requesting bot (e.g. Google).
  5. The bot that initiated the original fetch in point 1 then tries to render and index the page. It reaches out again to your site to fetch all the static content that we don't handle.

Now, what can break in these steps:

  1. Your servers block bot requests (e.g. Googlebot). If this happens, Prerender doesn't even have a chance of intercepting a request.
  2. Your server doesn't have access to the Internet. In this case, it can't reach out to Prerender and can't request us to fetch and render your page. Make sure that you allow your servers access service.prerender.io at least. There is also the possibility (depending on the tech stack you have) that your middlewares are attached in the wrong order. The Prerender middleware has to be attached before the middleware that serves your JS files (or anything that needs to be rendered), and in a way that your Rest API or similar middlewares don't get prerendered since it doesn't make much sense.
  3. This is more or less similar to point 1. If your firewall/CDN/ingress blocks non-standard user agents (please see our user agent string here), or it has IP filtering enabled and Prerender servers are not allowed to reach your site, then this step will fail. Another reason for this to fail is when you have geo-based filtering. Prerender.io servers are all around the globe. It may happen that a Prerender server hosted in Germany tries to fetch your geo-filtered US-based server, and it fails. Keep an eye on it!
  4. Some CDNs block non-standard user agents as well. We usually watch out for it and reach out to them if this happens, but you may identify this issue before we do. To test if this happens, try to fetch your page with the Prerender user agent and see if there are any JS errors in the console or not.
  5. As we only render and cache the HTML content, bots (like Googlebot) will still reach out to your server to fetch the rest of your page. This can still fail due to user agent or IP filtering.

My .htaccess file doesn't work.

There are a few common problems that may cause this:

  1. Modules are not enabled (headers_module, proxy_module, proxy_http_module, ssl_module, rewrite_module)
  2. .htaccess files are not enabled (See: Enable .htaccess)
  3. There are virtual domains or domain path mapping enabled on your domain. Well, this is a bit more difficult to identify and tackle. There are Apache reverse proxies that map multiple domains or map to multiple reverse paths when a request is handled. E.g. requesting blog.example.com ends up in blog.intranet.local/blog.html, while requesting app.example.com is reverse proxied to app.intranet.local/angular.html. Adding a .htaccess file to any of these locations will make prerender to request the wrong URL from your server (it will try to fetch blog.intranet.local/blog.html instead of blog.example.com). How to solve it? You tell us! It's your custom stack, you'll probably need to build some custom redirect rules instead of using the standard .htaccess file we usually suggest.
  4. We've seen cases where customers had multiple rewrite rules, structured in a way that the Prerender rewrite rule was never executed.

I have a staging/dev site that I want to be prerendered but not available from the Internet.

Well, that's tough. Since the Prerender service is hosted on the public Internet, you definitely have to allow it to access your site. By defining allow rules based on the Prerender user agent, you can do that. If you have a smart firewall, you can do a reverse lookup for the requesting IP address and ensure it ends with prerender.io. This is the same flow as Google suggests you identify Googlebots here.

Although this will let the Prerender cloud service properly fetch and cache your site, you still won't be able to see exactly how a bot will index it as you are still blocking those from accessing your site. Here is a list of the user agents we use in our middlewares: GitHub.

How can I test how my page is prerendered?

We usually test it two ways. The programmatic way is to fetch your site with curl or Postman. These tools are equally great for testing, although Postman is easier to use as it has a graphical user interface.

To test it with a single browser though you can either use Chrome's built in feature to override the user agent. More info here. If that seems complicated, you can also install a user agent switcher (we suggest Google's one here), then set up a new user agent that loks like a bot (e.g. Googlebot), and fetch your page.

When doing any of these, a good sign that we prerender your page is that it contains the actual page content (instead of the usual empty root element that most JS-based frameworks use), and it doesn't contain any JavaScript tags, except for json-ld scripts.

  1. Use the URL Parameters section of our site to tell us to ignore certain query parameters that might show up on your URLs that don't affect the page rendering. That way, we can save more of a canonical URL and not have lots of similar URLs cached.
  2. Modify your middleware to only send certain URLs to Prerender.io. This would mean that some pages would be prerendered and some wouldn't, but if certain parts of your site aren't valuable for SEO, you might consider not sending those URLs to Prerender.io.
  3. Return non-200 status codes for pages that don't need to be cached. If you have invalid URLs cached because they return a status code 200, you might want to return a non-200 status code from your server or use a prerender-status-code meta tag to tell us to return a non-200 status code for those URLs after the javascript has been run: Best Practices

Prerender gives back raw or empty content.

If the prerender returns an X-Prerender-Raw-Data header, the crawler cannot render the page and returns the original content instead. This happens because we still want to serve a page version, even if we cannot render it.


Was this article helpful?