- Print
- DarkLight
- PDF
Default Behavior
By default, we will render and cache every request you forward to our service. If you don't want to render a particular part or segment of your website, we provide multiple options to do so:
- You can ignore URL parameters
- You can ignore URLs based on rules.
- You can configure your middleware not to send requests to Prerender.io
Ignore specific URL parameters
A URL parameter is the segment in the URL matched like in this example https://example.com/path/file.html?parameterX=Y¶meterZ=42
You can configure which parameter is used for which behavior; The most common usage is to Ignore when Caching, which indicates that you don't consider a page different from another when this parameter is present.
In practice, if you configure the from
parameter to be ignored, then the following URLs will be seen as the same from our system:
https://yoursite.com/blog?from=home
https://yoursite.com/blog
How to Create New Parameter Ignore Rule
Visit the Dashboard here
Click the Add Parameter button.
Fill in the parameter's string (value)
Verify in the list Configuration changes are not instantly applied. It may take up to 59 minutes before the newly added parameter rule is applied to your environment.
Commonly Ignored Parameters
We identified a generous set of URL parameters that are often used for analytics. Those are applied by default for accounts created after January 2022.
Parameter | Commonly Used By | Recommended Behavior |
---|---|---|
utm_medium | Google Analytics | Ignore when caching |
utm_source | Google Analytics | Ignore when caching |
utm_campaign | Google Adwords | Ignore when caching |
utm_content | Google Adwords | Ignore when caching |
gclid | Google Adwords | Ignore when caching |
fbclid | Facebook / Pixel | Ignore when caching |
utm_term | Google Analytics | Ignore when caching |
Every account registered after January 2022 has a set of common URL parameters configured to be ignored but accounts created before this date will have to add these parameters manually.
Ignore and Respond 404
Sometimes you may want to remove matches from search engine results. Maybe accidentally served a non-useful page to a search engine?
In this case, we provide a URL rule matching which will serve an HTTP 404 response to requests routed to the matching URL.
You can configure this by visiting this page in your dashboard.
Contain Match Example
The contain match type will search for the given value in the full URL including domain, path, and parameters as well.
Also matching in parameter values:
Use the rule tester to ensure you are not matching some unwanted page.
Wildcard Match Example
Wildcard rules can contain *
as a special character to match patterns.
Here are some rules with some explanation
Pattern | Effect |
---|---|
*xyzxyz* | Ignore all URLs that contain xyzxyz |
http://* | Ignore all URLs that starts with http:// . Useful if you don't want to cache plaint HTTP pages |
*.aspx | Ignore all URLs that ends with .aspx |
https://example.com/* | Ignore all URLs that starts with https://example.com/ . Useful for filtering out unwanted domain names |
Note: You need to start your pattern with a *
if you don't want it to match the beginning of the URL. So using example.com/*
as a rule won't ignore https://example.com/something
because the rule doesn't start with a *
Regular expressions
If the options above are not enough for you, we can set up rules based on regular expressions. Please get in touch with our support team if you need such rules.
Configure The Middleware
The best and most versatile way to configure what is rendered and what is not, is to configure your middleware to only route requests that need to appear in the search engine results or be displayed on social sites. You can read more about this in the middlewares article.
But in case you do not want to deploy new code every time your SEO preferences changes, use the solutions mentioned above.
Robots.txt for Good Bots only!
We recommend you configure your robots.txt
, which well-behaving search engine crawlers respect.
But please be aware this does not mean that no robot will visit the URL. It ensures that bots like GoogleBot will not serve search results disallowed in the robots.txt.
Our system has no interaction with the robots.txt, and we do not read/interact with it. But it's generally recommended to be configured for the best SEO results.
You can read more about its intended behavior and use-cases in this Google Search Central article.
Render Counter
URLs matched with ignore rules will not count toward the render counter. But please aim to keep this functionality as a fallback when you only want to remove some specific URL or pattern, as in case of "abuse" or "extensive usage" our team can reach out to you to ask you to ignore those URLs before forwarded to our service.