How to validate broken link on the webpage using playwright?
1) Handle broken links in Playwright, typically follow a three-step
1) Extract all links from the page
use a locator to find all anchor (<a>) tags and then extract their href attributes.
2) Filter out invalid or repetitive URL
Use Set collection to automatically remove duplicate links and filter out non-HTTP links.
3) Then verify each URL by sending an HTTP request to check its status code.
2. Implementation Playwright, JS/TS
This script uses page.request.get() to check links.
it only fetches the HTTP response without rendering the full UI.
3. Key Strategies for Robust Testing
Soft Assertions: Use
expect.soft(),test continues to check the remaining validation even if previous validation. Ex : if one link is broken (e.g., a 404) , it will continue checking another link.HEAD vs. GET: Use
page.request.head()if you only need the status code; it is faster as it doesn't download the entire page body. However, some servers block HEAD requests, soGETis a safer fallback.Handle Redirections: By default, Playwright follows redirects. If you want to catch 301/302 "soft" redirects, you can check
request.redirectedFrom()or use themaxRedirectsoption in the request.Synthetic Monitoring: For production sites, you can schedule these scripts to run periodically (e.g., via GitHub Actions) to catch links that break over time due to content updates.
Comments
Post a Comment