What is the meta robots tag? The meta robots tags lets you determine at a webpage level, if the webpage should be indexed and served to users as a search result. There are a few ways to use the meta robots tags as well as many reasons to use them. Checking meta robot tags is part of website indexing optimization, it’s vital to make sure searchbot crawling is clear and unobstructed. SEO audits help check for optimization of meta tags to ensure all on page SEO factors are correct. The most frustrating this as always, is that searchbots don’t have to follow the meta robots tag, they often do, but like the robots.txt can bypass.
There are only 10 variables that can be defined, in a meta robots tag:
- Index – Tells searchbots to index the webpage and show as a search result
- NoIndex – Tells searchbots to not index the webpage, it will NOT be shown to users
- Follow – Tells searchbots to follow the url links on the webpage
- NoFollow – Tells searchbots to NOT follow the url links on the webpage
- Noimageindex – Tells searchbots to NOT index images on the webpage
- None – Tells searchbots to NOINDEX, NOFOLLOW, without typing as much
- Noarchive – Tells searchbots NOT to show a cache link for webpage in SERPs
- Nocache – Same as Noarchive, but used by only IE and Firefox
- Nosnippet – Tells searchbots NOT to show a snippet of the webpage
- Unavailable_after – Tells searchbots NOT to index the webpage after a certain date
The 2 most common meta robot tag configurations are:
- <meta name=”robots” content=”index,follow”>
- <meta name=”robots” content=”noindex,follow”>
Misconfigured Robot Tags Lead To Crawl Issues
Many times a website’s crawling issues stem from a simple plugin configuration that sets a group of webpages to noindex, nofollow or both. This will commonly have the affect of not being able to find the specific webpages in a search, using advanced search operators. Sometimes another issues can cause the symptoms of a noindex meta robot tag, but really is another crawling issue. You can tell the difference by looking at your webmaster tools crawl warnings and errors for your website. Search engines will list webpages that have issues that prevent searchbots from crawling. Once these issues are fixed, searchbots can crawl the webpage, and index.
It’s vital to make sure that this meta tag is properly configured. As well to know when it’s ok to have a webpage set to noindex. Many times, CMS websites will have extra duplicate pages, created by the category and tag taxonomy features. These pages are safe to set to noindex, due to the fact Google already understands the content to be duplicate and reads the canonical tags in such cases to determine content originality. When Google sees that the category and tag pages are home of duplicate content and the canonical tag says that content lives on another webpage on the domain. Google will then exclude that page from indexing, you can often see such in your google search console, under sitemap coverage. Why have searchbots waste crawl budget on something they will be excluding anyway?