23rd December, 2006

A Simple Python CPU Emulator /--evilbitz   

Emulators in computer science are computer software that emulates an environment for another software to run, it may include some hardware devices as well, but if a complete environment is emulated that it will be usually called a virtual machine. An example for such enironment is DOSEMU, a DOS operating system environment which is emulated and designed to run on top of Linux operating system. The core for any software emulator is the CPU emulation part, many virtual machines uses the CPU that the bochs project had created, it can run x86, AMD64, and PPC software.

I played a bit with emulators in the past, and I created a simple python CPU emulator that emulates “flat” assembly code that can be defined by a configuration file. In this post I’ll describe its features, how does it work and what can be achieved by using it.

Simple Python CPU Emulator

The CPU emulator itself has a couple of runtime variables that are being used by the parsers, it moves line by line over the assembly that you tell it to emulate, and if the current assembly line matches a pattern (defined in the configuration line) then it evaluates another pattern, according to the run-time state of the CPU registers.

The configuration file of the emulator has two parts, it defines the registers that the cpu holds and the instructions that the emulated CPU can execute. The design is very simple and elegant, when the emulator initializes it prepares an opcode python dictionary. For instance, the opcode INC is defined as follows in the configuration file:

INC $reg1 | $reg1 += 1

When the parser encounters that configuration line, it adds the following key => value to the dictionary:

opcodes["INC (.*?)"] = “cpu[execed.group(1)] += 1″

Where cpu[execed.group(1)] is the run-time CPU register that is matched by the CPU emulator parser. When the parser reads a new assembly line, it first moves through all the keys of the opcodes dictionary, and tries to do a regular expression match. If it finds one, then the value of the dictionary is being simply evaluated. Here is the source code of the emulation loop:

01: while (cur_op_line < end_of_ops):
02: opcode = a_ops[cur_op_line]
03: execed, ptrn = get_exec(opcode)
04: if (execed):
05: if a_verbosity: print "%d:\t%s" % (cur_op_line+1, opcode)
06: exec(opcodes[ptrn])
07: else:
08: print "[-] havn't found a match for", opcode
09:
10: cur_op_line += 1

A Riddle For Fun

My friend Imri gave me this riddle when he took a course in complexity at the Tel Aviv University.

You a have a CPU with 4 registers (r0 through r3) and 3 basic opcodes that works as described:

  • INC reg – increases the value of reg by 1
  • DEC reg – decreases the value of reg by 1
  • TEST reg, line – if reg != 0 then start executing opcodes at the specified line

Download the source code, it comes ready with the configuration file for this exercise, and write a program that will end the execution when one of the registers contains the value 144.

  • The CPU starts when all the registers are initialized to zero.
  • The program length must not exceed 20 instructions.

Have pun! :-)



Posted in design, programming | 2 Comments

20th December, 2006

The Greatest Resources On The Web /--evilbitz   

External links in Wikipedia are links to websites that contain information that is relevant to the article at which they appear, most often they are resources that the author used in order to compile his article at Wikipedia. It was interesting for me to find out what are the top 100 domain names that Wikipedia articles are linking to.

I downloaded the Wikipedia backup from the 11/30/2006 and created a PHP script that would create those statistics. I hereby release the statistics I managed to gather from the Wikipedia backup and the source code of the PHP script (please excuse me for the mess, and the CRLFs :-) ) that helped me create those statistics, for those of you who would like to play with it a bit more.

Technical Details

The PHP script moves through the backup file and finds the wikitext of each page (Wikipedia entry), then it parses the wikitext and finds out external links. I designed a simple database that will hold the results, it has an articles table which holds an id and title for each article, and an external links table which holds, for each external link, the domain name and the article id of where it appears. So at the end, after having the database fed with gathered data, the following query calculates the statistics: “select domain_name, count(*) from tbl_exlinks group by domain_name order by 2 desc limit 100;”

The PHP script creates a directory which holds some generated .sql files, which are being executed using the mysql command line tool. I also created a python script that will help me execute them.

The Statistics

en.wikipedia.org has 679597 links.
www.google.com has 81208 links.
www.findagrave.com has 44549 links.
www.britannica.com has 43854 links.
babelfish.altavista.com has 37437 links.
news.bbc.co.uk has 29627 links.
www.allmusic.com has 20782 links.
www.imdb.com has 17460 links.
books.google.com has 17092 links.
www.geocities.com has 15135 links.
www.bbc.co.uk has 10355 links.
www.myspace.com has 10175 links.
www.google.co.uk has 7961 links.
www.nytimes.com has 7805 links.
tools.wikimedia.de has 7210 links.
www.amazon.com has 7185 links.
www.ncbi.nlm.nih.gov has 5848 links.
maps.google.com has 5816 links.
www.washingtonpost.com has 5494 links.
www.guardian.co.uk has 5434 links.
www.youtube.com has 5212 links.
www.opsi.gov.uk has 4993 links.
planetmath.org has 4979 links.
www.cnn.com has 4927 links.
web.archive.org has 4703 links.
www.jewishencyclopedia.com has 4622 links.
www.mindat.org has 4552 links.
www.newadvent.org has 4356 links.
www.webmineral.com has 4145 links.
www.baseball-reference.com has 3616 links.
www.cbc.ca has 3613 links.
www.pbs.org has 3587 links.
www.findarticles.com has 3414 links.
www.parl.gc.ca has 3295 links.
www.abc.net.au has 3162 links.
www.nationalregisterofhistoricplaces.com has 2875 links.
www.aafla.org has 2818 links.
www.tv.com has 2692 links.
www.rollingstone.com has 2673 links.
www.angelfire.com has 2663 links.
www.perseus.tufts.edu has 2639 links.
www.gutenberg.org has 2628 links.
www.flheritage.com has 2611 links.
news.yahoo.com has 2556 links.
www.lib.utexas.edu has 2505 links.
www.timesonline.co.uk has 2489 links.
www.history.navy.mil has 2478 links.
members.aol.com has 2472 links.
imdb.com has 2471 links.
www.flickr.com has 2471 links.
www.biographi.ca has 2412 links.
groups.yahoo.com has 2379 links.
www.nba.com has 2348 links.
sports.espn.go.com has 2322 links.
www.fallingrain.com has 2305 links.
www.nps.gov has 2271 links.
de.wikipedia.org has 2215 links.
www.globalsecurity.org has 2213 links.
www.time.com has 2162 links.
www.cricinfo.com has 2127 links.
www.telegraph.co.uk has 2110 links.
www.usatoday.com has 2107 links.
www.msnbc.msn.com has 2099 links.
www.ethnologue.com has 2060 links.
www.hockeydb.com has 2056 links.
video.google.com has 2042 links.
groups.google.com has 2006 links.
www.pantheon.org has 1993 links.
www-history.mcs.st-andrews.ac.uk has 1977 links.
www.mobygames.com has 1945 links.
www.smh.com.au has 1909 links.
bioguide.congress.gov has 1897 links.
query.nytimes.com has 1878 links.
www.census.gov has 1815 links.
www.npr.org has 1810 links.
www.bartleby.com has 1808 links.
www.submission.info has 1794 links.
www.un.org has 1781 links.
www.discogs.com has 1779 links.
www.cia.gov has 1777 links.
www.alexa.com has 1774 links.
www.microsoft.com has 1723 links.
www.theage.com.au has 1692 links.
www.forbes.com has 1686 links.
www.gamespot.com has 1669 links.
www.boston.com has 1665 links.
www.gcr1.com has 1644 links.
www.nscb.gov.ph has 1638 links.
www.defenselink.mil has 1631 links.
www.wizards.com has 1606 links.
www.navsource.org has 1599 links.
www.t-macs.com has 1582 links.
www.probertencyclopaedia.com has 1558 links.
www.uefa.com has 1552 links.
www.sfgate.com has 1542 links.
www.state.gov has 1525 links.
www.reuters.com has 1517 links.
www.archive.org has 1509 links.
adsabs.harvard.edu has 1508 links.
nces.ed.gov has 1486 links.



Posted in random | 5 Comments

13th December, 2006

HOWTO – Debugging a remote Windows HVM under Xen using Windbg /--evilbitz   

This HOWTO describes how do debug a Windows HVM domain under the Xen 3.0.3 virtual machine monitor using Windbg. It is taken for granted that you know how to debug a local windows using a serial modem cable, and that you know how to manage Xen virtual machines, these issues are not going to be addressed in this howto.

In order to remotely debug a windows HVM using Windbg we’ll create a setup that will allow us to do so. You’ll basically need to have a network connection (that supports TCP/IP) between the host and target (the one that is being debugged) machines.

Target Computer Configuration

Let’s start with the Windows virtual machine (HVM) – Start Windows (using xm create…) and once it started, open the boot.ini file with your favourite text editor. Duplicate the right boot line that is booting the installation of Windows that you want to debug, in the new line, add the following boot switches: “/debug /debugbreak /debugport=COM1″ (the serial baud rate default value is 19200).

Your new line should look something like this:

multi(0)disk(0)rdisk(0)partition(1)\WINDOWS=”Microsoft Windows XP Professional” /fastdetect /debug /debugbreak /debugport=COM1

Save the file and shut-down the VM, open your HVM’s config file with a text editor. Make sure that the line “serial = ‘pty’” exists there, if it’s not than add it, but be sure that your Xen version supports this “feature”.

Now, launch the VM and attach the STDIN/STDOUT of the console (mapped to the HVM’s COM1 device) to a socket that you create with netcat:

xm create xp1
netcat -lk -p 4444 -c “xm console domain_id”

The second command creates a TCP socket and listens on port 4444. The INPUT and OUTPUT of the HVM COM1 serial port is now redirected to that socket. Not all versions of netcat on linux supports the -c switch, If you don’t have the right version of Netcat, then you can download it from here. We are now ready to configure the host computer.

Host Computer Configuration

At the host machine, which runs a native installation of Windows, we will create a virtual serial port that is associated with the target computer’s socket that we’ve created. In order to do that, we will use a tool called HW VSP (Virtual Serial Port), It is made by a company named Hw-Group which specialized in hardware kits. Anyway, the INPUT/OUTPUT of the serial port that you are going to create with HW-VSP will be redirected to the target’s remote socket. Write the target computer’s IP and port (4444), and name your virtual serial port COM5. This closes the loop between the HVM COM1 serial port and the local virtual serial port that you created.

Before you go and create the virtual serial port, you’ll have to configure HW-VSP not to use NVT, a feature that encapsulates the data being sent over the network for the virtual serial port, and not to use TEA based authentication. See the following image:

HW-VSP Configuration

That’s pretty much about it. Run windbg. Go to File->Kernel Debug.. and choose the virtual serial port you had created (COM5), specify the value 19200 for the baud rate. Press OK. You are ready to debug that HVM. Press ‘g’ to let Windows run after the debug break.

Some notes

It is possible to debug a Windows HVM from a windbg process that runs on anohter Windows HVM, but this can be accomplished pretty much in the same way that I described in this howto.

Comments and suggestions are welcome!

Update – 19/08/2008

Instead of using netcat, you can use the inetd service to redirect the i/o of the virtual serial port to a socket. This is more useful and easy since it is done automatically when you create the hvm. In order to do that you will have to edit these files: (I assume your hvm config file is called vistasp1)

  • /etc/inetd.conf – add this line at the end: “windbg_vistasp1 stream tcp nowait root /usr/sbin/tcpd xm console vistasp1″
  • /etc/services – add this line: “windbg_vistasp1 4444/tcp”


Posted in random | 5 Comments

9th December, 2006

An Intriguer Virus /--evilbitz   

Another post about viruses! YAY! well, and this time it’s even more sophisticated than my ecological computer viruses post. A few years ago I asked myself what kind of viruses can cause the most damage to a certain company or an organization? well, I’m not that evil (even if my nickname suggests so), but it still a nice question to ponder about. If you remember MyDoom, the worm which infected around 500,000 computers worldwide and launches a DDoS (Distributed Denial of Service) attack against The SCO Group’s website www.sco.com in 2004, you probably remember how much noise it made back then.

It made me think about an idea. Could a worm, such as MyDoom, lead a Big Company (BC) into some legal issues? could it force that company into juridical matters or problems? well… the answer is yes, let me explain how this can be achieved. Consider a worm such as MyDoom, which spreads into a decent amount of computers and launches a DNS DDoS attack, an attack that leverages the DNS protocol in its advantage in order to amplify the DDoS ~73 times than the original generated packets. Consider this power and the assumption that if it launches the attack against BC’s website, the website will be down for a long period, causing BC to lose huge portions of their income. But (and here comes the juridical issue…), what if that worm would have a predefined condition for this attack, such as the following: if BC’s website has a .txt file placed at a specific URL, and it would contain something like 15 other domain names, that each one of them has a Google Page Rank value of above 6, then the DDoS attack will move to those websites, sparing BC’s website and allowing more income to flow. Of course that those websites would be shutted down and they would blame BC for this fault. Lawsuits will come and BC might pay those websites for their damage, but, it might be still cost-effective for BC to place that .txt file, since the compensation will cost much less than the loses that would have been caused if BC hadn’t placed that .txt file.

A lot of side-effects emerges and this idea can be further developed. I’ll just bring up another “fun” thing that can be done with such power. One creative worm can let Big Company #1 (BC1) and Big Company #2 (BC2) play chess against each other, the worms will read a .txt file both at BC1 and BC2 and let them play chess against each other on a predefined board that would be extracted from the worm itself. Every 5 minutes, each company should make a move. The one who lose will suffer the DDoS attack. The worms would synchronize themselves in a P2P manner to prevent from the companies to trick them.

Well, that’s about it, I hope this post was fun reading as it was fun for me to write it :-)

Until the next time… CYa.



Posted in security, virus | Be The First To Comment!

8th December, 2006

Interrupts and Interrupt-Controllers /--evilbitz   

Abstract

This article is kind of a continuation for my under-the-hood article series, you can take a look at my previous articles regarding the PCI bus. In this article, I delve further deeper into interrupts, and we’ll explore what exactly happens when an interrupts occurs.

If you are a software developer, this information will help you grasp the PC architecture a little bit better. This article’s intent is to encourage people to explore things and widen their knowledge… I see too many people which are trapped inside their operating system’s environment, especially if they are using the M$ black box. This is the only reason I see for switching to Linux :-) Ok, so let’s start!

Interrupts

An interrupt is basically a signal from the hardware that tells the software to perform an operation. It is handled by the operating system that calls the ISR (Interrupt Service Routine) for that interrupt request (IRQ).

Generally, we distinguish between two different types of interrupts, Edge-Triggered and Level-Triggered. Edge-Triggered interrupts are interrupts which are being caused by changing the bus line level, it is basically a transition from a 1 to 0 or from 0 to 1 (falling-edge and rising-edge repectively). This “old fashion” type of interrupt was used in the ISA bus. The problem with this type of interrupt is that it is difficult to be shared, that means, several devices couldn’t shared the same IRQ line. Level-Triggered interrupts are being caused by raising or lowering the level of the bus line and holding it right there until the interrupt is serviced, they were used in the original PCI bus (and are still being used) as the standard type of interrupt.

Interrupt Controllers

The interrupt controller’s goal is to provide interrupt capabilities to the main processor (CPU) through a single line, when a device issues an interrupt, it is delivered to one of the interrupt controller’s IRQs (pins), from there the interrupt is generated in the CPU, which, in turn, checks with the interrupt controller for the source of the interrupt through a special register which are being hold at and managed by the interrupt controller. Old interrupt controllers, such as the PIC (Programmable Interrupt Controller) provided interrupt-priority, interrupt-masking and general flexibility for dealing with interrupts in the platform. Old devices where programmed to use fixed IRQs and problems arose when two or more devices shared the same IRQ, if the PIC was programmed to be used in the edge-triggered mode, then serious conflicts could cause the system to hang or not function at all. Well… In the edge-triggered mode interrupt actually could be shared if the devices were specially built for this event. But since the operating system must run all the ISRs that exists in the chain that is associated with that specific IRQ that is signaled, it is not so effective after all.

In the Level triggered mode, the PIC knew how to share IRQs between different devices, but sharing interrupts is not a good deal in any case, this of course leads to performance issues and faults that are being caused by poorly written device drivers.

Consider the following scenario: Device A, which shares his IRQ with Device B, signals its driver, the interrupt is issued and the operating system processes it. The chain of ISRs contains two different ISRs. First the driver for Device B is processing the interrupt becuase he is first in the chain, and actually decides that he is going to handle the interrupt (because its poorly written ISR handles any interrupt). The operating system sees that the interrupt was handled and stops executing the ISR-chain and acknowledges the PIC. After some time Device A sees that the interrupt wasn’t handled by its driver and issues the same interrupt again. This scenario leads to interrupt storms or causes Device A to stop function.

To solve this problem and allow more flexibility, Advanced Programmable Interrupt Controllers (APICs) were introduced, they contain more IRQs and their function is better adapted to the operating system. Windows, for example uses its IRQL mechanism to mask interrupts in the APIC, this is accomplished by a single mov assembly instruction, instead of runing several I/O port instructions (such as in or out) in the case of using a PIC.

APICs are being used in Multiprocessor environments, each CPU has its Local APIC and another controller, called I/O APIC actually routes interrupts from the bus to the LAPIC. This allows greater flexibility, and sharing interrupts is no longer an issue, since each LAPIC has 24 IRQs. Another advantage is that each APIC has it’s own timer, this allows each CPU to better schedule the CPU time distributed between threads in quantum units. APICs are also supports IPIs (Inter-Processor-Interrupt), a way of one processor to interrupt another processor. IPIs are being used for synchronization and cache-coherency. The I/O APIC’s function is to distribute interrupts between the CPUs in a multi-processor environment.

Final Words

If you bared with me to this point, It is surely admirable! even I couldn’t bare with myself writing this post :-)

Anyway, we saw how interrupts are being generated, how they are routed at some platforms which are based on the PIC and APIC implementation and we also took a look at how the operating system handles interrups. I hope this post was enjoyable for you.

Evilbitz.



Posted in design, lowlevel, programming | 15 Comments

Top »
"If you can't join them, beat them!"
Search Evilbitz: