Python IE Automation – Thorough Tutorial
Posted in hacking | by evilbitz |I haven’t seen a lot of info on this topic, so I thought I should post something about this:
Python IE automation is extremely easy using the InternetExplorer.Application COM object. Using this COM object you can automate IE to do all kind of stuff like automating any login process, downloading files or creating some underground bots
Here is how to acquire an interface to InternetExplorer.Application:
>>> from win32com.client import Dispatch
>>> ie = Dispatch(“InternetExplorer.Application”)
>>> ie.visible = 1
>>>
>>> # navigate to your favourite website
>>> ie.navigate(website_address)
>>>
Now your browser should navigate to the website address that you have specified, when the browser is finish loading the page, you can start doing the processing of the results…
This is how you wait for the page to finish loading:
>>> while (ie.ReadyState != 4):
>>> sleep(1)
>>>
When the page is done loading, you can get an interface to the document object, this is the same document that javascript & vbscript contains.
This gives you complete DOM control (domination!) over your current page that you last navigated to.
so let’s see how we can do some nice things with it:
>>> ie.navigate(“http://search.msn.com/“)
>>> ie.document.getElementById(“q”).value = “SinglePageMarketing”
>>> ie.document.getElementById(“srch_btn”).click()
>>>
ok, now what about parsing the results?
we can do this with a DOM like approach, or we can parse the text by ourselves… i chose the later method because it’s easier.
>>> result = ie.document.body.innerHtml
>>> len(result)
5619
>>>
Put aside that the result text is in unicode, to convert it to latin use the encode function:
>>> result = result.encode(‘latin-1′, ‘ignore’)
ok, now let’s get a list of all the links that were found by the search engine:
>>> import re
>>> re.findall(“your favourite regexp”, result)
well that’s it! now you know how to do the basics… it’s up to you to build your tools upon it!