candy creative page logo
blog post

Find Out the Number of Pages on a Website: A Free Tool & Step-by-Step Guide

Tyler Clay up close and personal

Tyler Clay

Sep 6th4 min read

Find Number of Pages on Website

Free Tool to Find Number of Pages on Website

You must obtain explicit permission from the owner of the website before crawling it, as crawling without authorization may violate the website’s terms of service or legal regulations. We are not responsible for any misuse or unauthorized access that results from using this tool. Some websites implement rate-limiting measures that may make this crawler unusable or significantly slow down the process. You can try increasing the timeout between requests to mitigate this, but we cannot guarantee the tool’s performance on all websites. The crawling process typically takes 3-4 minutes to complete, depending on the size and structure of the site and any rate-limiting imposed by the server. This tool has a limit of 300 URLs per crawl to protect our server capacity, as we do not have infinite processing power. Attempting to crawl beyond this limit may result in incomplete data or errors. We provide no guarantees regarding availability, performance, or accuracy of results; the tool is provided as-is, and we are not liable for any damages, data loss, or misuse. By using this tool, you agree to do so legally and ethically, ensuring no harm or disruption to services, and no violations of laws or regulations. To protect our server infrastructure, repeated or abusive use of this tool may result in restricted access. By proceeding, you acknowledge that you have read and understood this disclaimer and agree to use the tool responsibly.
Today we are unwrapping all those hidden pages inside your favorite website. 

If you need to find out the number of pages on your website, you’re in the right place. Maybe you just started maintaining a website and need to browse it for unknown pages, or maybe this is the start of a DIY SEO and security audit. 

Whether you're doing an SEO audit, analyzing competitor sites, or just plain curious, the tool above will give you a great idea of pages on the website. The tool will show a count and info on the pages crawled but is limited to 300 pages.

We plan to add a downloadable format of JSON or CSV soon with even more data crawled from your site.


1. Look at the sitemap

The easiest, but maybe not so obvious, way to find out the number of pages on a website is to check the sitemap. Most websites have a sitemap that lists all their pages, usually available at yourwebsite.com/sitemap.xml. Some websites might have multiple sitemaps, especially larger ones. The main sitemap will contain links to all the other sitemaps.

Here's how to finesse this:
  • Step 1: Open your web browser and go to yourwebsite.com/sitemap.xml (replace yourwebsite.com with the website you want to count pages on).
  • Step 2: If you see a list of URLs to pages, go to the next step. But if the sitemap shows links to other sitemaps (e.g., /sitemap-pages.xml, /sitemap-posts.xml), you’ll need to visit each one.
  • Step 3: Copy the entire list from each sitemap.
  • Step 4: Paste the list(s) into ChatGPT and ask, “How many unique URLs are in these lists?” This prompt ensures ChatGPT counts the total number of URLs across multiple sitemaps. In a majority of cases these sitemaps will have all the pages the site admin wants you to know about.
Warning: Some site owners might not include all their pages in the sitemap (either intentionally or unintentionally), meaning the sitemap might not be a complete reflection of the site’s true structure.


Dealing with Multiple Sitemaps:

  • Visit each linked sitemap individually.
  • Combine the lists from all the sitemaps into one document.
  • Then, use ChatGPT or Candy Creative Page Scanning tool to count the total number of pages across all sitemaps.

What if the sitemap is missing or incomplete?
  • Missing Sitemap: If the site doesn't have a sitemap, or if you suspect the sitemap is incomplete, proceed to the next option below.
  • Inconsistent Sitemap: If you think the sitemap isn’t fully truthful or might be excluding pages, you can also use the methods below or our tool to get a fuller picture.


2. Using Google Search

If you’re unsure about the sitemap or there is not one, go to Google, and search  site: but substitute the site with the website you want to search. operator is a great next step. This will list all the pages on the website indexed by Google. Google, like most search engines, respects the website’s robots.txt file, which tells Google crawlers to ignore certain pages. This means Google might not show all the pages that actually exist on the site.


Understanding robots.txt:

  • Step 1: Visit yourwebsite.com/robots.txt to view the file (replace yourwebsite.com with the actual domain).
  • Step 2: Look for lines that start with Disallow: followed by a path. These paths indicate pages or directories that the site owner has told Google not to index.


Robots Trick:
Sometimes webmasters will intentionally hide certain directories for security purposes. They usually also desire for these pages not to be indexed by search engine crawlers. As a result, you may find hidden pages in the robots.txt.
Example of paths being hidden in robots.txt:
User-agent: *
Disallow: /private-directory/
This tells search engine crawlers not to index stuff under /private-directory/, so it won't be in Google’s search results.


Handling Hidden or Hard-to-Discover URLs:

Some URLs on a website might be extremely hard to discover due to their structure or because they are not linked to other pages. This is where things can get a bit tricky:


Brute Forcing URLs:

Brute forcing is basically trying every combination possible until you succeed. A URL can be very long and may have to be exact to reveal a document, so it is quite resource-intensive to guess all combinations.
  • It requires visiting a vast number of potential URLs.
  • It can be time-consuming and resource-intensive.
  • Many websites use rate limiting, which slows down or blocks your requests if you make too many in a short period.

Pattern Recognition:

If URLs have identifiable patterns (e.g., /product/123, /user/john-doe), you can use the patterns you’ve discovered to generate more URLs. For instance, if you notice URLs in a series (e.g., /page/1, /page/2), you can write scripts to generate and check those URLs.


For Technical Users:

If you’re technically gifted, you can proceed by guessing URLs you generate with common path lists. Here's a list of common URL paths you can use. However, using this method can get hard if you don't know how to navigate around scripting. You could use a tool like ChatGPT to build a script that:
  • Iterates through any common paths or guesses you might have
  • Sends HTTP requests to each path.
  • Observe the responses, looking for status codes like 200 (OK), which indicate the URL exists.
  • Be sure to crawl any links that exist on exposed pages also (this can get deep if there are a lot of pages with many links)

Rate Limiting and IP Blocking:

Be aware that many sites have rate limiting, which restricts the number of requests you can make in a short period. If you exceed this limit, the site might slow down your requests or even block your IP address. Proceed cautiously if you go down this route.


3. Using Paid SEO Tools

If you’re not technically inclined and the above methods didn't get you the data you need, your next best option is to use paid SEO tools. These tools crawl websites and provide detailed reports on the number of pages, SEO details, and much more.

To figure out how many pages on a website exist, there are some paid solutions, like Screaming Frog. We did not find many free tools so we built a tool at the top of this page you can use to crawl a website for a list of pages. This is a free tool, and we have limited server bandwidth, so certain features of the crawler may be limited. You are only able to crawl up to 300 pages. If you need a more robust tool, we recommend checking out Screaming Frog.
Link: Screaming Frog SEO Spider


Closing Thoughts

It is actually pretty hard to figure out how many pages a website has. Wikipedia has over 200 million pages published. If you are crawling a site of this size you will need custom or paid tools. Remember, sitemaps can lie, and site owners have intentionally or unintentionally excluded pages before. If you encounter a sitemap you feel is bogus, use custom scraping tools and other SEO tools to get a bigger picture. If you're ready to dig deeper down the custom route, be prepared to deal with challenges like rate limiting and IP blocking.

Need help with your site’s SEO or web design? Holla at Candy Creative, where we’re dripping candy-coated creativity all over the web!

In Conclusion

And there's the play— I hope this helped you determine the number of pages on your website. If you wanna help support us, share the knowledge with your friends! 🍬

quote