How to Handle Pagination and Infinite Scroll When Web Scraping

TL;DR

Pagination is the biggest obstacle in web scraping — most data you want is spread across multiple pages or loaded dynamically. ScrapeMaster handles all four pagination types automatically: numbered pages, next-page buttons, load-more buttons, and infinite scroll. This guide explains how each type works under the hood and how to scrape them.

Why pagination matters for scraping

When you scrape a website, the first page of results is easy. The problem is that the first page almost never contains all the data. A job board might show 25 results per page out of 1,200 total. An e-commerce site might display 48 products out of 5,000. A directory might list 20 businesses per page across 50 pages.

If you only scrape the first page, you get a fraction of the available data. To get everything, your scraper needs to understand how the site loads additional content and follow that mechanism through to the end.

This is where most scraping attempts fail. The first page works fine. Getting pages 2 through 50 is where things break down.

The four types of pagination

Websites use four main approaches to split large datasets across multiple views. Each works differently under the hood, and each requires a different scraping strategy.

Type 1: Numbered pagination

What it looks like: Page numbers at the bottom of the results — "1 2 3 4 5 ... 50" — with the current page highlighted. Sometimes displayed as "Page 1 of 50" with clickable numbers.

How it works technically: Each page number is a link to a URL with a page parameter. For example:

Page 1: example.com/search?q=laptops&page=1
Page 2: example.com/search?q=laptops&page=2
Page 3: example.com/search?q=laptops&page=3

When you click a page number, the browser navigates to that URL. The server processes the request and returns a new HTML page with the next set of results. The entire page reloads.

Some sites use URL fragments instead: example.com/search?q=laptops#page=2. Others use POST parameters that are not visible in the URL.

Where you see it: Google search results, most e-commerce sites (Amazon, eBay), government databases, Zillow, academic search engines, and many traditional web applications.

Scraping challenges:

The total number of pages may not be obvious — you need to find the last page number or a "total results" count
Some sites limit the visible page numbers (showing "1 2 3 ... 50" but not 4 through 49)
Each page navigation causes a full page reload, which takes time
The URL structure may change between sites

Type 2: Next-page buttons

What it looks like: A "Next" or ">" button (sometimes with a "Previous" or "<" button) without visible page numbers. The button navigates to the next page of results.

How it works technically: The next-page button is either a link or a button element. When clicked:

If it is a link — The browser navigates to a new URL, similar to numbered pagination. The page reloads with new results.
If it is a button — JavaScript handles the click event. It might update the URL, make an API call to fetch new data, or submit a form with updated pagination parameters.

The key difference from numbered pagination is that you cannot see how many pages exist. You keep clicking "Next" until the button disappears or becomes disabled, which signals you have reached the last page.

Where you see it: Forums, blog archives, some directories, search engines with simplified navigation, and older web applications.

Scraping challenges:

No way to know the total number of pages upfront — you scrape until the "Next" button is gone
The button might be styled differently on different sites ("Next," "Next Page," ">," an arrow icon, etc.)
Some sites disable the button on the last page instead of hiding it — your scraper needs to detect both cases
JavaScript-based next buttons may be harder to identify than simple links

Type 3: Load-more buttons

What it looks like: A "Load More," "Show More," or "View More Results" button at the bottom of the results list. Clicking it appends additional results to the existing list without navigating away from the page.

How it works technically: When the button is clicked:

JavaScript sends an AJAX (XHR or Fetch) request to the server, asking for the next batch of results
The server returns the data (usually as JSON or an HTML fragment)
JavaScript inserts the new results below the existing ones
The button either updates to load the next batch or disappears when there are no more results

The URL in the browser usually does not change. All results accumulate on the same page.

Where you see it: Social media feeds, modern e-commerce sites, image galleries, news sites, and single-page applications.

Scraping challenges:

The page does not reload — results are appended to the DOM, so traditional page-navigation scraping does not work
You need to wait for the new results to load before scraping them
The button may have different text or styling after each click
Rate limiting — some sites throttle how quickly you can load more results
The total number of results may not be known until you reach the end

Type 4: Infinite scroll

What it looks like: No pagination controls at all. New results load automatically as you scroll toward the bottom of the page. The page appears to be endlessly long.

How it works technically: The site attaches a scroll event listener (or uses an Intersection Observer) that triggers when you scroll near the bottom of the results. When triggered:

JavaScript detects that the user is near the bottom of the content
It sends a request to the server for the next batch of results
The server returns the data
JavaScript appends the new results to the bottom of the page
The scroll area grows, allowing you to scroll further

This cycle repeats until there are no more results, at which point the scroll listener stops triggering or a "no more results" message appears.

Where you see it: Twitter/X, Instagram, Pinterest, Google Images, Reddit, YouTube, LinkedIn feeds, Google Maps results, and many modern web applications.

Scraping challenges:

No button to click — you need to simulate scrolling
Timing is critical — you need to scroll, wait for new content to load, then scroll again
You need to detect when you have reached the end (no more new content loads)
Pages can become extremely long, consuming significant memory in the browser
Some sites implement virtual scrolling — they remove elements that scroll out of view, meaning earlier results may not be in the DOM anymore
Load times vary — some batches load in milliseconds, others take seconds

How ScrapeMaster handles each pagination type

ScrapeMaster includes built-in handling for all four pagination types. When you enable pagination in the side panel, the AI detects which type the current site uses and applies the appropriate strategy.

Numbered pagination handling

ScrapeMaster identifies the page number elements and clicks through them sequentially. On each page:

It waits for the page to fully load
It extracts the data using the same column structure established on the first page
It appends the new rows to the existing table
It moves to the next page number

The process continues until the last page number has been processed. Duplicate header rows are automatically excluded.

Next-page button handling

ScrapeMaster identifies the "Next" button and clicks it repeatedly:

It finds the next-page control (link or button)
It clicks the control and waits for the page to load or update
It extracts the new data
It checks if the next-page control still exists and is not disabled
If the control is still available, it repeats. If not, scraping is complete.

Load-more button handling

ScrapeMaster clicks the load-more button and waits for new content:

It identifies the load-more button
It clicks the button
It waits for the DOM to update with new results
It checks if the button is still present
It repeats until the button disappears or no new content loads

After all content has been loaded, ScrapeMaster extracts data from the entire page at once — capturing all the results that accumulated from the repeated clicking.

Infinite scroll handling

ScrapeMaster simulates scrolling behavior:

It scrolls to the bottom of the current content
It waits for new content to load
It detects whether new elements have appeared in the DOM
It scrolls again if new content was loaded
It stops when no new content loads after a reasonable wait period

Once all content has been loaded through scrolling, the AI extracts data from the full page.

Manual approaches to pagination (and why they are hard)

Understanding the manual alternatives helps explain why automated handling is valuable.

Manual approach to numbered pagination

With copy-paste: Visit each page, select the results, copy, paste into a spreadsheet, repeat 50 times. For a 50-page dataset at one minute per page, that is almost an hour of tedious work.

With Python (requests + BeautifulSoup):

You would write a loop that constructs URLs with incrementing page numbers, sends HTTP requests, parses the HTML response, and extracts the data. This works until the site uses JavaScript rendering, requires authentication cookies, or has anti-scraping measures.

With Selenium or Playwright:

You automate a real browser to navigate through pages. This handles JavaScript but adds complexity: you need to install browser drivers, handle waits and timeouts, and deal with browser crashes on long runs.

Manual approach to infinite scroll

With copy-paste: Scroll, wait, scroll, wait — for potentially hundreds of scroll actions — then try to select everything on what is now an enormously long page. Practically impossible for large datasets.

With Python (requests):

Infinite scroll does not work with simple HTTP requests because the content is loaded via JavaScript. You would need to reverse-engineer the API calls that the scroll triggers, figure out the parameters (cursor tokens, offsets, etc.), and replicate them in your script. This can work but requires significant technical investigation for each site.

With Selenium or Playwright:

You automate scrolling in a real browser. The script scrolls down, waits for new content, checks if new elements appeared, and repeats. You need to handle:

Scroll timing (scrolling too fast and the content does not load; too slow and it takes forever)
End detection (how do you know there are no more results?)
Memory management (the page gets heavier with each scroll, eventually slowing the browser)
Virtual scrolling (if the site removes old elements as you scroll, you need to extract data before it disappears)

Manual approach to load-more buttons

With Python (requests):

Similar to infinite scroll — you need to find the API endpoint that the button triggers, replicate the request parameters, and paginate through the API responses. Sometimes the button sends a simple GET request with an offset parameter. Other times it sends complex POST requests with cursor tokens.

With Selenium or Playwright:

Automate clicking the button, wait for new content, click again. Simpler than infinite scroll automation but still requires handling waits, detecting when the button disappears, and managing page weight.

Common pagination edge cases

Sites that change pagination type based on device

Some sites use numbered pagination on desktop but infinite scroll on mobile. If your scraper gets the mobile version (due to viewport size or user agent), the pagination strategy you planned for may not work.

Solution: ScrapeMaster runs in your actual Chrome browser at whatever viewport size you are using. What you see is what it scrapes.

Pagination that resets when filters change

Applying a new filter on a paginated site often resets you to page 1. If your scraper applies filters mid-run, it may re-scrape already-seen pages.

Solution: Apply all filters before starting the scrape. Do not change filters while pagination is running.

Inconsistent page sizes

Some sites return different numbers of results on different pages — 25 on page 1, 23 on page 2, 30 on page 3. This can happen due to removed listings, sponsored results, or server-side variations.

Solution: This is usually not a problem for scraping. ScrapeMaster extracts whatever is on each page regardless of result count.

Pagination controls that load asynchronously

On some single-page applications, the pagination buttons themselves load after the page content. If a scraper looks for pagination controls before they exist in the DOM, it might conclude there is only one page.

Solution: ScrapeMaster waits for the page to fully render before analyzing pagination controls.

Duplicate results across pages

Some sites show overlapping results between pages (result 25 on page 1 also appears as result 1 on page 2). This is especially common on sites with real-time updates where new items are inserted between pages.

Solution: After export, deduplicate in your spreadsheet based on a unique identifier (like URL, name, or ID).

Token-based pagination (cursor pagination)

Instead of page numbers, some APIs use cursor tokens — opaque strings that the server generates to point to the next batch of results. The token changes with each request, so you cannot predict future page URLs.

Solution: Since ScrapeMaster operates in the browser and interacts with the site the way a user would (clicking buttons, scrolling), it does not need to understand cursor tokens. The browser handles the API calls, and ScrapeMaster extracts the rendered results.

Rate limiting and anti-bot measures

Some sites slow down or block requests if you paginate too quickly. You might see CAPTCHAs, error pages, or empty results.

Solution: ScrapeMaster operates at a pace similar to a human user, which generally stays within acceptable limits. If you encounter rate limiting, wait a few minutes and try again, or scrape in smaller batches.

Combining pagination with detail page following

The most powerful scraping configuration is pagination combined with detail page following. Here is how it works:

First page: ScrapeMaster extracts the data visible in search results (e.g., names, prices, locations)
Detail pages: For each result on the first page, ScrapeMaster clicks through to the detail page and extracts additional data (e.g., phone numbers, descriptions, specifications)
Next page: ScrapeMaster navigates to the next page of results
Repeat: It extracts search result data and detail page data for every subsequent page

This gives you a comprehensive dataset that combines the breadth of search results with the depth of individual detail pages.

Example workflow: Scraping a job board

Search results show: job title, company name, location, salary range
Detail pages add: full job description, required skills, application deadline, company size
With pagination: you get this data for every job listing across all pages, not just the first 25

How to choose the right pagination strategy

You do not need to identify the pagination type manually when using ScrapeMaster — the extension detects it automatically. But understanding the types helps you:

Estimate scraping time — Numbered pagination with full page reloads is slower per page than infinite scroll with DOM appends
Plan batch sizes — For very large datasets, you might want to scrape in segments (pages 1 to 20, then 21 to 40) and combine exports
Debug issues — If pagination is not working as expected, knowing the type helps you troubleshoot (e.g., "this site uses infinite scroll but only loads 10 items at a time with a 2-second delay")

Performance expectations by pagination type

Numbered pagination

Speed: Moderate. Each page requires a full page load (1 to 5 seconds per page depending on the site).
Reliability: High. Page numbers are deterministic — page 5 is always page 5.
Data from 50 pages: Typically 2 to 5 minutes.

Next-page buttons

Speed: Similar to numbered pagination since each click usually triggers a page load.
Reliability: High, as long as the button is detected consistently.
Data from 50 pages: Typically 2 to 5 minutes.

Load-more buttons

Speed: Faster than page navigation because only the new data loads, not the entire page.
Reliability: Generally high. The main risk is the button changing location or label after loading.
Data from 50 clicks: Typically 1 to 3 minutes.

Infinite scroll

Speed: Variable. Depends on how quickly the site loads new content and how many items load per scroll event.
Reliability: Generally good, but detecting "end of content" can be tricky.
Data from equivalent of 50 pages: Typically 2 to 8 minutes depending on load times and batch sizes.

Frequently asked questions

Can ScrapeMaster handle all pagination types automatically?

Yes. ScrapeMaster detects and handles numbered pagination, next-page buttons, load-more buttons, and infinite scroll. Enable pagination in the side panel and the extension identifies the type and processes it automatically.

How does the extension know when to stop paginating?

For numbered pagination, it stops after the last page number. For next-page buttons, it stops when the button disappears or becomes disabled. For load-more buttons, it stops when the button is no longer present. For infinite scroll, it stops when scrolling no longer triggers new content to load.

What if a site uses a combination of pagination types?

Some sites do combine types — for example, infinite scroll within a page that also has numbered pages. ScrapeMaster handles the dominant pagination mechanism. If results are not complete, you can manually adjust your approach (e.g., scroll to load all content on one page, then let the extension handle the page-to-page navigation).

Does pagination scraping work on sites that require login?

Yes. Since ScrapeMaster runs inside your browser session, it uses your existing authentication. Log in first, navigate to the paginated content, then enable pagination in the extension.

How long does it take to scrape a paginated dataset?

It depends on the number of pages and the site's load time. A typical paginated site with 50 pages takes 2 to 5 minutes. Infinite scroll varies more widely — a feed with 1,000 items might take 5 to 10 minutes depending on how quickly the site loads new batches.

Can I stop pagination mid-way through?

Yes. You can stop the pagination process at any point from the ScrapeMaster side panel. The data collected up to that point is preserved in the table and can be exported.

Do I need to keep the browser tab active while paginating?

The tab should remain open and not be navigated away from. You can switch to other tabs, but do not close or navigate the tab that ScrapeMaster is actively paginating.

What happens if the site rate-limits me during pagination?

If the site returns errors or empty results due to rate limiting, try again after a short wait. For sites with aggressive rate limiting, consider scraping smaller subsets of data (using tighter search filters) rather than trying to paginate through the entire dataset at once.

Is infinite scroll harder to scrape than numbered pages?

Historically, yes — infinite scroll was significantly harder because it required simulating scroll behavior and managing memory. With ScrapeMaster, the difficulty is abstracted away. The extension handles scroll simulation, content detection, and end-of-content detection automatically. From a user perspective, scraping infinite scroll is as easy as scraping numbered pages: enable pagination and wait.

Bottom line

Pagination is the difference between getting one page of data and getting all the data. Whether a site uses numbered pages, next-page buttons, load-more buttons, or infinite scroll, ScrapeMaster detects the pagination type and handles it automatically. Enable pagination in the side panel, and the extension collects every page into a single exportable table. No coding, no configuration, no manual page-clicking. Free and unlimited.