Mechanize

Language: Python

Web

Mechanize was created as a Python port of Perl’s Mechanize module, enabling automated browsing and web scraping in Python. It is particularly useful for automating repetitive web tasks or interacting with websites that require form submissions.

Mechanize is a Python library for stateful programmatic web browsing. It allows you to automate interaction with websites, including filling forms, clicking links, and handling cookies and sessions, similar to a web browser.

Installation

pip: pip install mechanize

conda: conda install -c conda-forge mechanize

Usage

Mechanize provides a browser-like interface in Python. You can navigate pages, select forms, fill them out, submit them, and retrieve responses. It supports handling cookies, redirects, and headers automatically.

Opening a webpage

import mechanize
br = mechanize.Browser()
br.open('http://example.com')
print(br.title())

Creates a Mechanize browser instance, opens a URL, and prints the page title.

Following a link

link = br.find_link(text='More information')
br.follow_link(link)
print(br.geturl())

Finds a link by its text and navigates to the linked page.

Filling and submitting a form

br.select_form(nr=0)
br['username'] = 'myuser'
br['password'] = 'mypassword'
br.submit()

Selects the first form on the page, fills in username and password fields, and submits the form.

Handling cookies

br.set_cookiejar(mechanize.CookieJar())
br.open('http://example.com')

Enables cookie handling to maintain sessions across multiple requests.

Setting headers

br.addheaders = [('User-agent', 'Mozilla/5.0')]
br.open('http://example.com')

Adds a custom User-Agent header to mimic a real browser.

Handling redirects

br.set_handle_redirect(True)
br.open('http://example.com')

Ensures the browser automatically follows HTTP redirects.

Error Handling

mechanize.HTTPError: Check the response status code and ensure the URL is correct.

mechanize.URLError: Verify network connectivity and the validity of the URL.

mechanize.LinkNotFoundError: Ensure the link text exists on the page or use different selection criteria.

Best Practices

Always set a User-Agent to avoid being blocked by websites.

Use CookieJar to manage sessions when scraping multiple pages.

Avoid scraping websites without permission; respect robots.txt.

Use proper exception handling for HTTP errors and timeouts.

Combine Mechanize with BeautifulSoup for parsing page content efficiently.

Official Docs Github