General tips and best practices
What can go wrong?
Let's first understand how Prerender.io works to pinpoint issues in our setup.
- A bot tries to fetch your site (e.g., Googlebot).
- The Prerender.io middleware intercepts the request (based on the user agent) and invokes the Prerender cloud service to fetch your site's content.
- The Prerender.io service fetches your site using the Prerender.io user agent, but it only renders, caches and stores the DOM, it doesn't cache images or any other static assets.
- The bot that initiated the original fetch in point 1 then tries to render and index the page. It reaches out again to your site to fetch all the static content that we don't handle.
Now, what can break in these steps:
- Your servers block bot requests (e.g. Googlebot). If this happens, Prerender doesn't even have a chance of intercepting a request.
- Your server doesn't have access to the Internet. In this case, it can't reach out to Prerender and can't request us to fetch and render your page. Make sure that you allow your servers access
service.prerender.ioat least. There is also the possibility (depending on the tech stack you have) that your middlewares are attached in the wrong order. The Prerender middleware has to be attached before the middleware that serves your JS files (or anything that needs to be rendered), and in a way that your Rest API or similar middlewares don't get prerendered since it doesn't make much sense.
- This is more or less similar to point 1. If your firewall/CDN/ingress blocks non-standard user agents (please see our user agent string here), or it has IP filtering enabled and Prerender servers are not allowed to reach your site, then this step will fail. Another reason for this to fail is when you have geo-based filtering. Prerender.io servers are all around the globe. It may happen that a Prerender server hosted in Germany tries to fetch your geo-filtered US-based server, and it fails. Keep an eye on it!
- Some CDNs block non-standard user agents as well. We usually watch out for it and reach out to them if this happens, but you may identify this issue before we do. To test if this happens, try to fetch your page with the Prerender user agent and see if there are any JS errors in the console or not.
- As we only render and cache the HTML content, bots (like Googlebot) will still reach out to your server to fetch the rest of your page. This can still fail due to user agent or IP filtering.
My .htaccess file doesn't work.
There are a few common problems that may cause this:
- Modules are not enabled (headers_module, proxy_module, proxy_http_module, ssl_module, rewrite_module)
.htaccessfiles are not enabled (See: Enable .htaccess)
- There are virtual domains or domain path mapping enabled on your domain. Well, this is a bit more difficult to identify and tackle. There are Apache reverse proxies that map multiple domains or map to multiple reverse paths when a request is handled. E.g. requesting
blog.example.comends up in
blog.intranet.local/blog.html, while requesting
app.example.comis reverse proxied to
app.intranet.local/angular.html. Adding a
.htaccessfile to any of these locations will make prerender to request the wrong URL from your server (it will try to fetch
blog.example.com). How to solve it? You tell us! It's your custom stack, you'll probably need to build some custom redirect rules instead of using the standard
.htaccessfile we usually suggest.
- We've seen cases where customers had multiple rewrite rules, structured in a way that the Prerender rewrite rule was never executed.
I have a staging/dev site that I want to be prerendered but not available from the Internet.
Well, that's tough. Since the Prerender service is hosted on the public Internet, you definitely have to allow it to access your site. By defining allow rules based on the Prerender user agent, you can do that. If you have a smart firewall, you can do a reverse lookup for the requesting IP address and ensure it ends with prerender.io. This is the same flow as Google suggests you identify Googlebots here.
Although this will let the Prerender cloud service properly fetch and cache your site, you still won't be able to see exactly how a bot will index it as you are still blocking those from accessing your site. Here is a list of the user agents we use in our middlewares: GitHub.
How can I test how my page is prerendered?
We usually test it two ways. The programmatic way is to fetch your site with
curl or Postman. These tools are equally great for testing, although Postman is easier to use as it has a graphical user interface.
To test it with a single browser though you can either use Chrome's built in feature to override the user agent. More info here. If that seems complicated, you can also install a user agent switcher (we suggest Google's one here), then set up a new user agent that loks like a bot (e.g.
Googlebot), and fetch your page.
I see unwanted links cached by Prerender.
- Use the URL Parameters section of our site to tell us to ignore certain query parameters that might show up on your URLs that don't affect the page rendering. That way, we can save more of a canonical URL and not have lots of similar URLs cached.
- Modify your middleware to only send certain URLs to Prerender.io. This would mean that some pages would be prerendered and some wouldn't, but if certain parts of your site aren't valuable for SEO, you might consider not sending those URLs to Prerender.io.
Prerender gives back raw or empty content.
If the prerender returns an
X-Prerender-Raw-Data header, the crawler cannot render the page and returns the original content instead. This happens because we still want to serve a page version, even if we cannot render it.