PDFs and media files can slip into search results even when we never meant them to. That usually happens because we apply HTML rules to files that are not HTML, and that only gets us so far.

The fix is straightforward once we know where to look. The X-Robots-Tag header gives us direct control over PDFs, images, videos, and other non-HTML files, so we can block indexing, allow indexing, or tighten how search engines handle each asset.

When we set it up well, we clean up search visibility without guessing. That matters whether we are keeping internal documents out of Google or making sure the right file is the one that ranks. First, let’s look at what the header actually does.

What the X-Robots-Tag Header Does

The X-Robots-Tag is an HTTP response header. That means it travels with the file when the server sends it to a crawler. We use it when the asset itself needs instructions, not the HTML page around it.

That matters because PDFs, images, and videos do not give us an HTML <meta> robots tag. The header fills that gap. Google documents this behavior in its robots meta tag specifications, and its page-level granularity update explains why the header exists in the first place.

HTTP/1.1 200 OK
Content-Type: application/pdf
X-Robots-Tag: noindex, nofollow
Cache-Control: public, max-age=3600

That kind of response tells a crawler what to do with the file before anything is rendered. MDN’s X-Robots-Tag reference is also useful when we want a plain-language recap of the header and its common directives.

Close-up of web server dashboard showing X-Robots-Tag: noindex for PDF file, server room background with glowing screens.

The main idea is simple. If the crawler can fetch the file, it can read the header. If it cannot fetch the file, it cannot read the instruction.

How We Implement X-Robots-Tag for PDFs

For PDFs, we usually set the header at the server, CDN, or application layer. The PDF file does not need HTML. It only needs the right response headers when the request is made.

That is why PDF handling feels different from page SEO. If we are used to HTML pages, it helps to compare this with our noindex tag implementation guide, because the goal is similar even though the delivery method is different. On a page, we place a meta tag in the head. On a PDF, we send a header with the file.

The most common setup is simple:

X-Robots-Tag: noindex

If we want to stop the file from appearing in search results, that is usually the cleanest approach. If we also want to reduce link following inside the file, we can add nofollow, although support can vary by crawler and document type. We should test it, not assume it behaves exactly the same everywhere.

Stack of PDF documents on modern desk next to open laptop showing file server response with subtle X-Robots-Tag noindex overlay in clean office.

Here is a quick look at the directives we reach for most often.

DirectiveBest useWhat it changes
noindexPDFs we do not want in search resultsKeeps the file out of the index after crawl
nofollowFiles with links we do not want crawled throughTells supported crawlers not to follow links in the file
nosnippetAssets where we want to limit preview textReduces or removes snippets in results
indexifembeddedPDFs that are meant to be embedded on a pageLets the file be indexed when it appears in an approved embed

The big takeaway is this: the directive needs to match the job. If we want the file removed from search results, noindex is the starting point. If we want the PDF to support a page, not compete with it, we need to be more careful with how the file is exposed.

X-Robots-Tag for Images, Videos, and Other Media

The same header works for other non-HTML assets too. That includes images, videos, and some document formats. This is where the header becomes especially useful, because media files rarely have their own HTML wrapper.

If we run a gallery, media library, or video archive, we often have two separate goals. One is to keep the media file under control. The other is to let the supporting page rank. Those are not the same thing.

For example, an image file may need noindex, but the HTML product page that uses that image may still need to rank. In that case, we control the file, not the page. That is a good fit for modern Google guidance on non-HTML content, and it is one reason the X-Robots-Tag header keeps showing up in technical audits.

Media file icons on hard drive interface, one image highlighted with subtle noindex tag.

This is also where rendering gets tricky. Search engines do not render a JPG, MP4, or PDF the same way they render a page. They fetch the asset, read the response, and decide what to do next. So if the media file is blocked by auth, hidden behind the wrong rule, or stripped by the CDN, the crawler may never see the header at all.

That is why we treat the file, the page, and the delivery layer as a set. If one part is out of sync, the whole setup gets messy.

How It Fits with Robots.txt, Canonicals, and Crawl Budget

It helps to separate the tools. They solve related problems, but they do not do the same job.

ToolBest useWhat it does not do
robots.txtStop crawling of private or low-value pathsIt does not remove indexed URLs by itself
X-Robots-TagControl indexing for PDFs, images, videos, and other non-HTML filesIt does not block crawling if the file is accessible
Canonical tagsConsolidate duplicate versions of a page or fileThey do not block indexing on their own

If we are still shaping crawl access, our robots.txt SEO best practices guide is the right companion piece. robots.txt can keep crawlers out, but it cannot tell them what to do with a file they already found.

Canonicalization is the same kind of separate step. If the same PDF exists at multiple URLs, we need to decide which version is preferred. Our canonical SEO for indexing guide covers the page side of that problem, and the same thinking applies to file libraries. A canonical helps consolidate signals. It does not replace noindex.

This is where crawl budget enters the picture too. A large media library can eat crawl time fast, especially when duplicates or dated files pile up. Our crawl budget optimization strategies guide pairs well with this topic because the more noise we remove, the more likely search engines are to spend time on the assets that matter.

Troubleshooting When Files Still Show Up in Search

If a PDF or media file still appears in search results after we set the header, we usually have a delivery problem, not a search problem.

The first thing we check is the final response. The header has to be on the response that returns the file, not only on a redirect or on the page that links to the file. If a CDN, storage bucket, or application layer strips the header, the crawler never gets the message.

Next, we check access. If robots.txt blocks the crawler before it can fetch the file, it may never read the header at all. That is why blocking and deindexing are different steps. If we want Google to see the instruction, we usually need to allow the crawl first.

Then we look for duplicates. A file can live in more than one place, and one copy may still be indexable. If that happens, we need to clean up the extra URLs or point them to the preferred version.

Finally, we give search engines time. Even when the header is correct, cached results can stick around until the next crawl. That is normal. The important part is making sure the live response is right.

A Quick Checklist for PDFs and Media Files

Before we ship a file setup, we usually run through a short list.

  • Use noindex on PDFs, images, or videos we do not want in search results.
  • Keep the file crawlable if we want search engines to read the header.
  • Put the header on the final response, especially after redirects.
  • Keep canonical signals aligned when the same file exists at multiple URLs.
  • Check the CDN, object storage, and server config after each deployment.
  • Review media libraries when crawl activity looks wasteful or uneven.

That checklist keeps us from mixing up crawling, indexing, and duplication. It also makes troubleshooting much easier later, because we know which layer is responsible for which decision.

Conclusion

The X-Robots-Tag header gives us control over files that HTML tags can’t handle well. That makes it one of the cleanest ways to manage PDFs, images, videos, and other non-HTML assets.

If we remember one thing, it’s this, the file has to be crawlable before the crawler can read the instruction. Once we get that part right, we can keep the right assets visible and keep the wrong ones out of search. That is a simple fix with a big payoff.

We use cookies so you can have a great experience on our website. View more
Cookies settings
Accept
Decline
Privacy & Cookie policy
Privacy & Cookies policy
Cookie name Active

Who we are

Our website address is: https://nkyseo.com.

Comments

When visitors leave comments on the site we collect the data shown in the comments form, and also the visitor’s IP address and browser user agent string to help spam detection. An anonymized string created from your email address (also called a hash) may be provided to the Gravatar service to see if you are using it. The Gravatar service privacy policy is available here: https://automattic.com/privacy/. After approval of your comment, your profile picture is visible to the public in the context of your comment.

Media

If you upload images to the website, you should avoid uploading images with embedded location data (EXIF GPS) included. Visitors to the website can download and extract any location data from images on the website.

Cookies

If you leave a comment on our site you may opt-in to saving your name, email address and website in cookies. These are for your convenience so that you do not have to fill in your details again when you leave another comment. These cookies will last for one year. If you visit our login page, we will set a temporary cookie to determine if your browser accepts cookies. This cookie contains no personal data and is discarded when you close your browser. When you log in, we will also set up several cookies to save your login information and your screen display choices. Login cookies last for two days, and screen options cookies last for a year. If you select "Remember Me", your login will persist for two weeks. If you log out of your account, the login cookies will be removed. If you edit or publish an article, an additional cookie will be saved in your browser. This cookie includes no personal data and simply indicates the post ID of the article you just edited. It expires after 1 day.

Embedded content from other websites

Articles on this site may include embedded content (e.g. videos, images, articles, etc.). Embedded content from other websites behaves in the exact same way as if the visitor has visited the other website. These websites may collect data about you, use cookies, embed additional third-party tracking, and monitor your interaction with that embedded content, including tracking your interaction with the embedded content if you have an account and are logged in to that website.

Who we share your data with

If you request a password reset, your IP address will be included in the reset email.

How long we retain your data

If you leave a comment, the comment and its metadata are retained indefinitely. This is so we can recognize and approve any follow-up comments automatically instead of holding them in a moderation queue. For users that register on our website (if any), we also store the personal information they provide in their user profile. All users can see, edit, or delete their personal information at any time (except they cannot change their username). Website administrators can also see and edit that information.

What rights you have over your data

If you have an account on this site, or have left comments, you can request to receive an exported file of the personal data we hold about you, including any data you have provided to us. You can also request that we erase any personal data we hold about you. This does not include any data we are obliged to keep for administrative, legal, or security purposes.

Where your data is sent

Visitor comments may be checked through an automated spam detection service.
Save settings
Cookies settings