22nd October, 2006

Python IE Automation – Thorough Tutorial   

Posted in hacking | by evilbitz |



I haven’t seen a lot of info on this topic, so I thought I should post something about this:

Python IE automation is extremely easy using the InternetExplorer.Application COM object. Using this COM object you can automate IE to do all kind of stuff like automating any login process, downloading files or creating some underground bots ;)

Here is how to acquire an interface to InternetExplorer.Application:

>>> from win32com.client import Dispatch
>>> ie = Dispatch(“InternetExplorer.Application”)
>>> ie.visible = 1
>>>
>>> # navigate to your favourite website
>>> ie.navigate(website_address)
>>>

Now your browser should navigate to the website address that you have specified, when the browser is finish loading the page, you can start doing the processing of the results…

This is how you wait for the page to finish loading:

>>> while (ie.ReadyState != 4):
>>> sleep(1)
>>>

When the page is done loading, you can get an interface to the document object, this is the same document that javascript & vbscript contains.

This gives you complete DOM control (domination!) over your current page that you last navigated to.

so let’s see how we can do some nice things with it:

>>> ie.navigate(“http://search.msn.com/“)
>>> ie.document.getElementById(“q”).value = “SinglePageMarketing”
>>> ie.document.getElementById(“srch_btn”).click()
>>>

ok, now what about parsing the results?
we can do this with a DOM like approach, or we can parse the text by ourselves… i chose the later method because it’s easier.

>>> result = ie.document.body.innerHtml
>>> len(result)
5619
>>>

Put aside that the result text is in unicode, to convert it to latin use the encode function:

>>> result = result.encode(‘latin-1′, ‘ignore’)

ok, now let’s get a list of all the links that were found by the search engine:

>>> import re
>>> re.findall(“your favourite regexp”, result)

well that’s it! now you know how to do the basics… it’s up to you to build your tools upon it!



There are currently 4 responses to “Python IE Automation – Thorough Tutorial”

Why not let us know what you think by adding your own comment! Your opinion is as valid as anyone elses, so come on... let us know what you think.

  1. 1 On December 27th, 2007, Antonio Xavier said:

    Hi

    Is it possible to view the IE’s security certificate via python script? If so can you please post a sample code.

    Thanks & Rgds

    Antonio Xavier

  2. 2 On August 5th, 2008, David said:

    I think your useful example script should be updated, the line:

    >>> ie.document.getElementById(”srch_btn”).click()

    must be changed in:

    >>> ie.document.getElementById(”go”).click()

    in order to work.

    Regards,
    David

  3. 3 On December 11th, 2008, vipsy said:

    Its much cleaner and typesafe, to do it in C#.
    Its easy too.

  4. 4 On May 14th, 2009, silvere said:

    Why does my python don’t have win32com?

    >>> from win32com.client import Dispatch

    Traceback (most recent call last):
    File “”, line 1, in
    from win32com.client import Dispatch
    ImportError: No module named win32com.client

Leave a Reply

You must be logged in to post a comment.

Top »
"If you can't join them, beat them!"
Search Evilbitz: