IDAPython-指令粒度

使用IDAPython进行指令粒度的操作，其实主要就是关注如下几点：

如何从函数地址获取到指令
对指令进行分析：包括指令、操作数，其中又包含指令类型、操作数类型、操作数值

所使用到的API速查：

模块	API原型	功能说明
idautils	FuncItems(ea)	获取地址ea所在函数的所有指令集合
idc	GetDisasm(ea)	获取地址ea所在指令的反汇编（返回字符串）
idc	next_head(ea)	获取地址ea的下一条指令的地址
idc	prev_head(ea)	获取地址ea的上一条指令的地址
idc	next_addr(ea)	地址ea+1
idc	prev_addr(ea)	地址ea-1
idc	print_insn_mnem(ea)	获取地址ea所在指令的助记符（返回字符串）
idc	print_operand(ea, n)	获取地址ea所在指令的第n个操作数（从0开始，返回字符串）
idc	get_operand_type(ea, n)	获取地址ea所在指令的第n个操作数的类型
idc	get_operand_value(ea, n)	获取地址ea所在指令的第n个操作数的值

指令（construction）

可以使用idautils.FuncItems(ea)来获取到当前地址ea所在函数的所有指令集合。如下，同时，可以使用idc.GetDisasm(ea)来获取到指令的反汇编。演示所用到的程序是cgibin，在main函数处。

import idautils
import idc

dism_items = idautils.FuncItems(here())
for item in dism_items:
	print(hex(item), idc.GetDisasm(item))

'''
Python>
0x402da0 addiu   $sp, -0x30
0x402da4 sw      $ra, 0x20+var_sC($sp)
0x402da8 sw      $s2, 0x20+var_s8($sp)
0x402dac sw      $s1, 0x20+var_s4($sp)
0x402db0 sw      $s0, 0x20+var_s0($sp)
0x402db4 li      $gp, 0x43F7F0
0x402dbc sw      $gp, 0x20+var_10($sp)
0x402dc0 move    $s1, $a1
0x402dc4 la      $t9, strrchr
...
'''

idautils.FuncItems(ea)会获取ea地址所在函数的所有指令的地址，返回一个迭代器。idc.GetDiasm(ea)则是在第一小节的时候出现过用来打印指令的反汇编结果。
如果是要获取到当前指令ea的上一条指令的地址和下一条指令的地址，分别使用idc.next_head、idc.prev_head，当然这只是从代码段的地址顺序遍历指令，不会因为当前指令是某个跳转则遍历到跳转后的指令。这两条指令可以应用在遍历某一段地址的指令上。

Python>ea = here()
Python>print(hex(ea))
0x402da0
Python>idc.next_head(ea)
0x402da4
Python>idc.prev_head(ea)
0x402d9c

需要和这两个API进行区分：idc.next_addr、idc.prev_addr，这两条指令只会机械增加和减小地址：

Python>ea = here()
Python>print(hex(ea))
0x402da0
Python>idc.next_addr(ea) # 增加1
0x402da1
Python>idc.prev_addr(ea) # 减小1
0x402d9f

操作数（Operands）

操作数的打印如下，主要就是可以将一条指令分解为助记符、操作数1、操作数2等，但是都是以字符串的形式进行打印。

Python>idc.GetDisasm(ea)         # 单条汇编指令
'addiu   $sp, -0x30'
Python>idc.print_insn_mnem(ea)。 # 打印助记符
'addiu'
Python>idc.print_operand(ea, 0)  # 打印第一个操作数
'$sp'
Python>idc.print_operand(ea, 1)。# 打印第二个操作数
'-0x30'

操作数类型（operand type）

有时候需要对指令的操作数类型进行判断，例如某个命令执行函数通过寄存器传参，传入的是一个立即数（命令字符串地址）还是某个内存（函数栈变量）呢。使用idc.get_operand_type(ea, n)可以对操作数的类型进行判断，其中ea是当前指令地址，n是第n+1个操作数，例如n=0为第一个操作数。
操作数的类型如下，好气，居然没有MIPS的，实际测试也发现MIPS架构判断操作数类型不可用，如果要判断操作数类型就只能使用idc.print_operand来进行字符串的比较：

o_void     = ida_ua.o_void      # No Operand                           ----------
o_reg      = ida_ua.o_reg       # General Register (al,ax,es,ds...)    reg
o_mem      = ida_ua.o_mem       # Direct Memory Reference  (DATA)      addr
o_phrase   = ida_ua.o_phrase    # Memory Ref [Base Reg + Index Reg]    phrase
o_displ    = ida_ua.o_displ     # Memory Reg [Base Reg + Index Reg + Displacement] phrase+addr
o_imm      = ida_ua.o_imm       # Immediate Value                      value
o_far      = ida_ua.o_far       # Immediate Far Address  (CODE)        addr
o_near     = ida_ua.o_near      # Immediate Near Address (CODE)        addr
o_idpspec0 = ida_ua.o_idpspec0  # Processor specific type
o_idpspec1 = ida_ua.o_idpspec1  # Processor specific type
o_idpspec2 = ida_ua.o_idpspec2  # Processor specific type
o_idpspec3 = ida_ua.o_idpspec3  # Processor specific type
o_idpspec4 = ida_ua.o_idpspec4  # Processor specific type
o_idpspec5 = ida_ua.o_idpspec5  # Processor specific type

# x86
o_trreg  =       ida_ua.o_idpspec0      # trace register
o_dbreg  =       ida_ua.o_idpspec1      # debug register
o_crreg  =       ida_ua.o_idpspec2      # control register
o_fpreg  =       ida_ua.o_idpspec3      # floating point register
o_mmxreg  =      ida_ua.o_idpspec4      # mmx register
o_xmmreg  =      ida_ua.o_idpspec5      # xmm register

# arm
o_reglist  =     ida_ua.o_idpspec1      # Register list (for LDM/STM)
o_creglist  =    ida_ua.o_idpspec2      # Coprocessor register list (for CDP)
o_creg  =        ida_ua.o_idpspec3      # Coprocessor register (for LDC/STC)
o_fpreglist  =   ida_ua.o_idpspec4      # Floating point register list
o_text  =        ida_ua.o_idpspec5      # Arbitrary text stored in the operand
o_cond  =        (ida_ua.o_idpspec5+1)  # ARM condition as an operand

# ppc
o_spr  =         ida_ua.o_idpspec0      # Special purpose register
o_twofpr  =      ida_ua.o_idpspec1      # Two FPRs
o_shmbme  =      ida_ua.o_idpspec2      # SH & MB & ME
o_crf  =         ida_ua.o_idpspec3      # crfield      x.reg
o_crb  =         ida_ua.o_idpspec4      # crbit        x.reg
o_dcr  =         ida_ua.o_idpspec5      # Device control register

操作数值（operand value）

获取到了操作数的类型还不够，例如不仅需要知道某条指令的第一个操作数是寄存器，还需要知道寄存器编号。使用idc.get_operand_value(ea, n)可以获取到地址ea处指令的第n+1个操作数的值。该API是在ida_ua基础上进行封装的，代码如下。

def get_operand_value(ea, n):
    """
    Get number used in the operand

    This function returns an immediate number used in the operand

    @param ea: linear address of instruction
    @param n: the operand number

    @return: value
        operand is an immediate value  => immediate value
        operand has a displacement     => displacement
        operand is a direct memory ref => memory address
        operand is a register          => register number
        operand is a register phrase   => phrase number
        otherwise                      => -1
    """
    insn = ida_ua.insn_t()
    inslen = ida_ua.decode_insn(insn, ea)
    if inslen == 0:
        return -1
    op = insn.ops[n]
    if not op:
        return -1

    if op.type in [ ida_ua.o_mem, ida_ua.o_far, ida_ua.o_near, ida_ua.o_displ ]:
        value = op.addr
    elif op.type == ida_ua.o_reg:
        value = op.reg
    elif op.type == ida_ua.o_imm:
        value = op.value
    elif op.type == ida_ua.o_phrase:
        value = op.phrase
    else:
        value = -1
    return value

可以看到函数注释中的返回值类型和结果：

操作数是立即数：返回立即数
操作数是displacement：（俺不晓得）
操作数是直接内存引用：内存地址
操作数是寄存器：寄存器编号
操作数是register phrase：（俺不晓得）

指令分析示例

那没办法，随便找了一个ARM架构为例，如下，该指令的第一个操作数和第二个操作数类型都是寄存器：

Python>ea = idc.here()
Python>idc.GetDisasm(ea)            # 反编译指令
'MOV             R4, R0'
Python>idc.get_operand_type(ea, 0)  # 第一个操作数类型：寄存器
0x1
Python>idc.get_operand_value(ea, 0) # 第一个操作数值：寄存器编号4->R4
0x4
Python>idc.get_operand_type(ea, 1)  # 第二个操作数类型：寄存器
0x1
Python>idc.get_operand_value(ea, 1) # 第二个操作数值：寄存器编号0->R0
0x0

同样，对一个LDR指令进行分析：

Python>ea = idc.here()
Python>idc.GetDisasm(ea)
'LDR             R3, =aTagSIsEmpty; "tag %s is empty\\n"'
Python>idc.get_operand_type(ea, 0)  # 第一个操作数类型：寄存器
0x1
Python>idc.get_operand_value(ea, 0) # 第一个操作数值：寄存器编号3->R3
0x3
Python>idc.get_operand_type(ea, 1)  # 第二个操作数类型：直接内存引用
0x2
Python>idc.get_operand_value(ea, 1) # 第二个操作数值：内存的地址0x12a8ac
0x12a8ac

跳转到地址0x12a8ac，此处的确存储的是字符串的地址。

1 2	.text:0012A8AC off_12A8AC DCD aTagSIsEmpty ; DATA XREF: sub_12A774+C8↑r .text:0012A8AC ; "tag %s is empty\n"

对一个跳转指令BL进行分析：

Python>ea = idc.here()
Python>idc.GetDisasm(ea)
'BL              ProcUserLog'
Python>idc.get_operand_type(ea, 0)
0x7
Python>idc.get_operand_value(ea, 0)
0x41e2c

其中，地址0x41e2位于plt中：

.plt:00041E2C
.plt:00041E2C
.plt:00041E2C ; Attributes: thunk
.plt:00041E2C
.plt:00041E2C ProcUserLog
.plt:00041E2C ADR             R12, 0x141E34
.plt:00041E30 ADD             R12, R12, #0x41000
.plt:00041E34 LDR             PC, [R12,#(ProcUserLog_ptr - 0x182E34)]! ; __imp_ProcUserLog
.plt:00041E34 ; End of function ProcUserLog
.plt:00041E34