Ignore Rules
  • 02 May 2022
  • 3 Minutes to read
  • Contributors
  • Dark
    Light
  • PDF

Ignore Rules

  • Dark
    Light
  • PDF

Default Behavior

By default, we will render and cache every request you forward to our service. If you don't want to render a particular part or segment of your website, we provide multiple options to do so:

  • You can ignore URL parameters
  • You can ignore URLs based on rules.
  • You can configure your middleware not to send requests to Prerender.io

Ignore specific URL parameters

A URL parameter is the segment in the URL matched like in this example https://example.com/path/file.html?parameterX=Y&parameterZ=42

You can configure which parameter is used for which behavior; The most common usage is to Ignore when Caching, which indicates that you don't consider a page different from another when this parameter is present.

In practice, if you configure the from parameter to be ignored, then the following URLs will be seen as the same from our system:

https://yoursite.com/blog?from=home
https://yoursite.com/blog

How to Create New Parameter Ignore Rule

Visit the Dashboard here

Click the Add Parameter button.

Prerender add param.png

Fill in the parameter's string (value)

image.png

Verify in the list
add-param-screen-2.png

Please Be Aware

Configuration changes are not instantly applied. It may take up to 59 minutes before the newly added parameter rule is applied to your environment.

Commonly Ignored Parameters

We identified a generous set of URL parameters that are often used for analytics. Those are applied by default for accounts created after January 2022.

Parameter Commonly Used By Recommended Behavior
utm_medium Google Analytics Ignore when caching
utm_source Google Analytics Ignore when caching
utm_campaign Google Adwords Ignore when caching
utm_content Google Adwords Ignore when caching
gclid Google Adwords Ignore when caching
fbclid Facebook / Pixel Ignore when caching
utm_term Google Analytics Ignore when caching
Members with older accounts!

Every account registered after January 2022 has a set of common URL parameters configured to be ignored but accounts created before this date will have to add these parameters manually.


Ignore and Respond 404

Sometimes you may want to remove matches from search engine results. Maybe accidentally served a non-useful page to a search engine?
In this case, we provide a URL rule matching which will serve an HTTP 404 response to requests routed to the matching URL.

You can configure this by visiting this page in your dashboard.

Contain Match Example

The contain match type will search for the given value in the full URL including domain, path, and parameters as well.

image.png

Also matching in parameter values:
image.png

Use the rule tester to ensure you are not matching some unwanted page.

image.png

Wildcard Match Example

Wildcard rules can contain * as a special character to match patterns.

image.png

Here are some rules with some explanation

Pattern Effect
*xyzxyz* Ignore all URLs that contain xyzxyz
http://* Ignore all URLs that starts with http://. Useful if you don't want to cache plaint HTTP pages
*.aspx Ignore all URLs that ends with .aspx
https://example.com/* Ignore all URLs that starts with https://example.com/. Useful for filtering out unwanted domain names

Note: You need to start your pattern with a * if you don't want it to match the beginning of the URL. So using example.com/* as a rule won't ignore https://example.com/something because the rule doesn't start with a *

Regular expressions

If the options above are not enough for you, we can set up rules based on regular expressions. Please get in touch with our support team if you need such rules.


Configure The Middleware

The best and most versatile way to configure what is rendered and what is not, is to configure your middleware to only route requests that need to appear in the search engine results or be displayed on social sites. You can read more about this in the middlewares article.

But in case you do not want to deploy new code every time your SEO preferences changes, use the solutions mentioned above.


Robots.txt for Good Bots only!

We recommend you configure your robots.txt, which well-behaving search engine crawlers respect.
But please be aware this does not mean that no robot will visit the URL. It ensures that bots like GoogleBot will not serve search results disallowed in the robots.txt.

Our system has no interaction with the robots.txt, and we do not read/interact with it. But it's generally recommended to be configured for the best SEO results.

You can read more about its intended behavior and use-cases in this Google Search Central article.


Was this article helpful?

What's Next