How to validate broken link on the webpage using playwright?

1) Handle broken links in Playwright, typically follow a three-step 

1)  Extract all links from the page

use a locator to find all anchor (<a>) tags and then extract their href attributes. 

2) Filter out invalid or repetitive URL

Use Set collection to automatically remove duplicate links and filter out non-HTTP links.

3) Then verify each URL by sending an HTTP request to check its status code.

2. Implementation Playwright, JS/TS

This script uses page.request.get() to check links. instead navigating to each page.

it only fetches the HTTP response without rendering the full UI.

3. Key Strategies for Robust Testing

  • Soft Assertions: Use expect.soft(), test continues to check the remaining validation even if previous validation. Ex :  if one link is broken (e.g., a 404) , it will continue checking another link.

  • HEAD vs. GET: Use page.request.head() if you only need the status code; it is faster as it doesn't download the entire page body. However, some servers block HEAD requests, so GET is a safer fallback.

  • Handle Redirections: By default, Playwright follows redirects. If you want to catch 301/302 "soft" redirects, you can check request.redirectedFrom() or use the maxRedirects option in the request.

  • Synthetic Monitoring: For production sites, you can schedule these scripts to run periodically (e.g., via GitHub Actions) to catch links that break over time due to content updates.



import { test, expect } from '@playwright/test';

test('validate broken link on the webpage', async ({ page }) => {
  await page.goto('https://google.com');

  // Get all href attributes
  const allLinksOnWebPage = await page.locator('a').all();
  const UniqueUrls = new Set<string>();

  for (const link of allLinksOnWebPage) {
    const href = await link.getAttribute('href');
    console.log("href :"+href);
    if (href) {
      // Resolve relative URLs to absolute ones
      const absoluteUrl = new URL(href, page.url()).href;
      console.log("absoluteUrl:"+absoluteUrl);
     
      // Filter for http/https and ignore duplicates
      if (absoluteUrl.startsWith('http')) {
        UniqueUrls.add(absoluteUrl);
        console.log("absoluteUrl:"+absoluteUrl);
      }
    }
  }

 
  // Validate each URL
  for (const url of UniqueUrls) {
    try {
      // Use HEAD or GET request to check status
      console.log("url :"+url);
      const response = await page.request.get(url);

     
      console.log("response:"+response.status());
     
      // Use soft assertions to collect all broken links without stopping the test
      expect.soft(response.ok(), `Broken link found: ${url} (Status: ${response.status()})`).toBeTruthy();
    } catch (error) {
      console.error(`Error checking ${url}:`, error);
    }
  }
});

Comments

Popular posts from this blog

GenAI in QA Automation: Game Changer or Just Hype?

How to choose the right QA Automation tool for your web application?