Find Out the Number of Pages on a Website: A Free Tool & Step-by-Step Guide
Tyler Clay
Sep 6th4 min read
Find Number of Pages on Website
Free Tool to Find Number of Pages on Website
https://
Terms & Conditions By using this web crawler, you agree to the following terms:
Today we are unwrapping all those hidden pages inside your favorite website.
If you need to find out the number of pages on your website, you’re in the right place. Maybe you just started maintaining a website and need to browse it for unknown pages, or maybe this is the start of a DIY SEO and security audit.
Whether you're doing an SEO audit, analyzing competitor sites, or just plain curious, the tool above will give you a great idea of pages on the website. The tool will show a count and info on the pages crawled but is limited to 300 pages.
We plan to add a downloadable format of JSON or CSV soon with even more data crawled from your site.
If you need to find out the number of pages on your website, you’re in the right place. Maybe you just started maintaining a website and need to browse it for unknown pages, or maybe this is the start of a DIY SEO and security audit.
Whether you're doing an SEO audit, analyzing competitor sites, or just plain curious, the tool above will give you a great idea of pages on the website. The tool will show a count and info on the pages crawled but is limited to 300 pages.
We plan to add a downloadable format of JSON or CSV soon with even more data crawled from your site.
1. Look at the sitemap
The easiest, but maybe not so obvious, way to find out the number of pages on a website is to check the sitemap. Most websites have a sitemap that lists all their pages, usually available at yourwebsite.com/sitemap.xml
. Some websites might have multiple sitemaps, especially larger ones. The main sitemap will contain links to all the other sitemaps.Here's how to finesse this:
- Step 1: Open your web browser and go to
yourwebsite.com/sitemap.xml
(replaceyourwebsite.com
with the website you want to count pages on). - Step 2: If you see a list of URLs to pages, go to the next step. But if the sitemap shows links to other sitemaps (e.g.,
/sitemap-pages.xml
,/sitemap-posts.xml
), you’ll need to visit each one. - Step 3: Copy the entire list from each sitemap.
- Step 4: Paste the list(s) into ChatGPT and ask, “How many unique URLs are in these lists?” This prompt ensures ChatGPT counts the total number of URLs across multiple sitemaps. In a majority of cases these sitemaps will have all the pages the site admin wants you to know about.
Warning: Some site owners might not include all their pages in the sitemap (either intentionally or unintentionally), meaning the sitemap might not be a complete reflection of the site’s true structure.
Dealing with Multiple Sitemaps:
- Visit each linked sitemap individually.
- Combine the lists from all the sitemaps into one document.
- Then, use ChatGPT or Candy Creative Page Scanning tool to count the total number of pages across all sitemaps.
What if the sitemap is missing or incomplete?
- Missing Sitemap: If the site doesn't have a sitemap, or if you suspect the sitemap is incomplete, proceed to the next option below.
- Inconsistent Sitemap: If you think the sitemap isn’t fully truthful or might be excluding pages, you can also use the methods below or our tool to get a fuller picture.
2. Using Google Search
If you’re unsure about the sitemap or there is not one, go to Google, and search site:
but substitute the site with the website you want to search. operator is a great next step. This will list all the pages on the website indexed by Google. Google, like most search engines, respects the website’s robots.txt
file, which tells Google crawlers to ignore certain pages. This means Google might not show all the pages that actually exist on the site.
Understanding robots.txt:
- Step 1: Visit
yourwebsite.com/robots.txt
to view the file (replaceyourwebsite.com
with the actual domain). - Step 2: Look for lines that start with
Disallow:
followed by a path. These paths indicate pages or directories that the site owner has told Google not to index.
Robots Trick:Sometimes webmasters will intentionally hide certain directories for security purposes. They usually also desire for these pages not to be indexed by search engine crawlers. As a result, you may find hidden pages in the robots.txt.
User-agent: *
Disallow: /private-directory/
This tells search engine crawlers not to index stuff under /private-directory/
, so it won't be in Google’s search results.
Handling Hidden or Hard-to-Discover URLs:
Some URLs on a website might be extremely hard to discover due to their structure or because they are not linked to other pages. This is where things can get a bit tricky:
Brute Forcing URLs:
Brute forcing is basically trying every combination possible until you succeed. A URL can be very long and may have to be exact to reveal a document, so it is quite resource-intensive to guess all combinations.- It requires visiting a vast number of potential URLs.
- It can be time-consuming and resource-intensive.
- Many websites use rate limiting, which slows down or blocks your requests if you make too many in a short period.
Pattern Recognition:
If URLs have identifiable patterns (e.g.,/product/123
, /user/john-doe
), you can use the patterns you’ve discovered to generate more URLs. For instance, if you notice URLs in a series (e.g., /page/1
, /page/2
), you can write scripts to generate and check those URLs.
For Technical Users:
If you’re technically gifted, you can proceed by guessing URLs you generate with common path lists. Here's a list of common URL paths you can use. However, using this method can get hard if you don't know how to navigate around scripting. You could use a tool like ChatGPT to build a script that:- Iterates through any common paths or guesses you might have
- Sends HTTP requests to each path.
- Observe the responses, looking for status codes like 200 (OK), which indicate the URL exists.
- Be sure to crawl any links that exist on exposed pages also (this can get deep if there are a lot of pages with many links)
Rate Limiting and IP Blocking:
Be aware that many sites have rate limiting, which restricts the number of requests you can make in a short period. If you exceed this limit, the site might slow down your requests or even block your IP address. Proceed cautiously if you go down this route.
3. Using Paid SEO Tools
If you’re not technically inclined and the above methods didn't get you the data you need, your next best option is to use paid SEO tools. These tools crawl websites and provide detailed reports on the number of pages, SEO details, and much more.To figure out how many pages on a website exist, there are some paid solutions, like Screaming Frog. We did not find many free tools so we built a tool at the top of this page you can use to crawl a website for a list of pages. This is a free tool, and we have limited server bandwidth, so certain features of the crawler may be limited. You are only able to crawl up to 300 pages. If you need a more robust tool, we recommend checking out Screaming Frog. Link: Screaming Frog SEO Spider
Closing Thoughts
It is actually pretty hard to figure out how many pages a website has. Wikipedia has over 200 million pages published. If you are crawling a site of this size you will need custom or paid tools. Remember, sitemaps can lie, and site owners have intentionally or unintentionally excluded pages before. If you encounter a sitemap you feel is bogus, use custom scraping tools and other SEO tools to get a bigger picture. If you're ready to dig deeper down the custom route, be prepared to deal with challenges like rate limiting and IP blocking. Need help with your site’s SEO or web design? Holla at Candy Creative, where we’re dripping candy-coated creativity all over the web!
In Conclusion
And there's the play— I hope this helped you determine the number of pages on your website. If you wanna help support us, share the knowledge with your friends! 🍬