Self-modifying code

Self-modification can be accomplished in a variety of ways depending upon the programming language and its support for pointers and/or access to dynamic compiler or interpreter "engines": • overlay of existing instructions (or parts of instructions such as opcode, register, flags or addresses) • direct creation of whole instructions or sequences of instructions in memory • creation or modification of source code statements followed by a "mini compile" or a dynamic interpretation (see eval statement) • creating an entire program dynamically and then executing it Assembly language Self-modifying code is quite straightforward to implement when using assembly language. Instructions can be dynamically created in memory (or else overlaid over existing code in non-protected program storage), in a sequence equivalent to the ones that a standard compiler may generate as the object code. With modern processors, there can be unintended side effects on the CPU cache that must be considered. The method was frequently used for testing "first time" conditions, as in this suitably commented IBM/360 assembler example. It uses instruction overlay to reduce the instruction path length by , where N is the number of records on the file (−1 being the overhead to perform the overlay). SUBRTN NOP OPENED FIRST TIME HERE? * The NOP is x'4700'<Address_of_opened> OI SUBRTN+1,X'F0' YES, CHANGE NOP TO UNCONDITIONAL BRANCH (47F0...) OPEN INPUT AND OPEN THE INPUT FILE SINCE IT'S THE FIRST TIME THRU OPENED GET INPUT NORMAL PROCESSING RESUMES HERE ... Alternative code might involve testing a "flag" each time through. The unconditional branch is slightly faster than a compare instruction, as well as reducing the overall path length. In later operating systems for programs residing in protected storage, this technique could not be used, and so changing the pointer to the subroutine would be used instead. The pointer would reside in dynamic storage and could be altered at will after the first pass to bypass the OPEN (having to load a pointer first instead of a direct branch and link to the subroutine would add N instructions to the path length – but there would be a corresponding reduction of N for the unconditional branch that would no longer be required). Below is an example in Zilog Z80 assembly language. The code increments register B in range [0, 5]. The CP compare instruction is modified on each loop. ;========== ORG 0H CALL FUNC00 HALT ;========== FUNC00: LD A,6 LD HL,label01+1 LD B,(HL) label00: INC B LD (HL),B label01: CP $0 JP NZ,label00 RET ;========== Self-modifying code is sometimes used to overcome limitations in a machine's instruction set. For example, in the Intel 8080 instruction set, one cannot input a byte from an input port that is specified in a register. The input port is statically encoded in the instruction itself, as the second byte of a two-byte instruction. Using self-modifying code, it is possible to store a register's contents into the second byte of the instruction, then execute the modified instruction in order to achieve the desired effect. High-level languages Some compiled languages explicitly permit self-modifying code. For example, the ALTER verb in COBOL may be implemented as a branch instruction that is modified during execution. Some batch programming techniques involve the use of self-modifying code. Clipper and SPITBOL also provide facilities for explicit self-modification. The Algol compiler on B6700 systems offered an interface to the operating system whereby executing code could pass a text string or a named disc file to the Algol compiler and was then able to invoke the new version of a procedure. With interpreted languages, the "machine code" is the source text and may be susceptible to editing on-the-fly: in SNOBOL the source statements being executed are elements of a text array. Other languages, such as Perl and Python, allow programs to create new code at run-time and execute it using an eval function, but do not allow existing code to be mutated. The illusion of modification (even though no machine code is really being overwritten) is achieved by modifying function pointers, as in this JavaScript example: var f = function (x) {return x + 1}; // assign a new definition to f: f = new Function('x', 'return x + 2'); Lisp macros also allow runtime code generation without parsing a string containing program code. The Push programming language is a genetic programming system that is explicitly designed for creating self-modifying programs. While not a high-level language, it is not as low-level as assembly language. Compound modification Prior to the advent of multiple windows, command-line systems might offer a menu system involving the modification of a running command script. Suppose an MS-DOS batch file MENU.BAT contains the following: :start SHOWMENU.EXE Upon initiation of MENU.BAT from the command line, SHOWMENU presents an on-screen menu, with possible help information, example usages and so forth. Eventually the user makes a selection that requires a command SOMENAME to be performed: SHOWMENU exits after rewriting the file MENU.BAT to contain :start SHOWMENU.EXE CALL SOMENAME.BAT GOTO start Because the command interpreter does not compile a script file and then execute it, nor does it read the entire file into memory before starting execution, nor yet rely on the content of a record buffer, when SHOWMENU exits, the command interpreter finds a new command to execute (it is to invoke the script file SOMENAME, in a directory location and via a protocol known to SHOWMENU), and after that command completes, it goes back to the start of the script file and reactivates SHOWMENU ready for the next selection. Should the menu choice be to quit, the file would be rewritten back to its original state. Although this starting state has no use for the label, it, or an equivalent amount of text is required, because the command interpreter recalls the byte position of the next command when it is to start the next command, thus the re-written file must maintain alignment for the next command start point to indeed be the start of the next command. Aside from the convenience of a menu system (and possible auxiliary features), this scheme means that the SHOWMENU.EXE system is not in memory when the selected command is activated, a significant advantage when memory is limited. Control tables Control table interpreters can be considered to be, in one sense, "self-modified" by data values extracted from the table entries (rather than specifically hand coded in conditional statements of the form IF inputx = 'yyy'). Channel programs Some IBM access methods traditionally used self-modifying channel programs, where a value, such as a disk address, is read into an area referenced by a channel program, where it is used by a later channel command to access the disk. ==History==