Basic syntax The Icon language is derived from the
ALGOL-class of
structured programming languages, and thus has syntax similar to
C or
Pascal. Icon is most similar to Pascal, using syntax for assignments, the keyword and similar syntax. On the other hand, Icon uses C-style braces for structuring execution groups, and programs start by running a procedure called . In many ways Icon also shares features with most
scripting languages (as well as
SNOBOL and SL5, from which they were taken): variables do not have to be declared, types are cast automatically, and numbers can be converted to strings and back automatically. Another feature common to many scripting languages, but not all, is the lack of a line-ending character; in Icon, lines that do not end with a semicolon get ended by an implied semicolon if it makes sense. Procedures are the basic building blocks of Icon programs. Although they use Pascal naming, they work more like C functions and can return values; there is no keyword in Icon. procedure doSomething(aString) write(aString) end
Goal-directed execution One of the key concepts in SNOBOL was that its functions returned the "success" or "failure" as primitives of the language rather than using
magic numbers or other techniques. For example, a function that returns the position of a
substring within another string is a common routine found in most language
runtime systems. In
JavaScript to find the position of the word "World" within a
"Hello, World!" program would be accomplished with , which would return 7 in the variable . If one instead asks for the the code will "fail", as the search term does not appear in the string. In JavaScript, as in most languages, this will be indicated by returning a magic number, in this case -1. In SNOBOL a failure of this sort returns a special value, . SNOBOL's syntax operates directly on the success or failure of the operation, jumping to labelled sections of the code without having to write a separate test. For instance, the following code prints "Hello, world!" five times: • SNOBOL program to print Hello World I = 1 LOOP OUTPUT = "Hello, world!" I = I + 1 LE(I, 5) : S(LOOP) END To perform the loop, the less-than-or-equal operator, , is called on the index variable I, and if it ucceeds, meaning I is less than 5, it branches to the named label and continues. Icon retained the concept of flow control based on success or failure but developed the language further. One change was the replacement of the labelled -like branching with block-oriented structures in keeping with the
structured programming style that was sweeping the computer industry in the late 1960s. The second was to allow "failure" to be passed along the call chain so that entire blocks will succeed or fail as a whole. This is a key concept of the Icon language. Whereas in traditional languages one would have to include code to test the success or failure based on
Boolean logic and then branch based on the outcome, such tests and branches are inherent to Icon code and do not have to be explicitly written. For instance, consider this bit of code written in the
Java programming language. It calls the function to read a character from a (previously opened) file, assigns the result to the variable , and then s the value of to another file. The result is to copy one file to another. will eventually run out of characters to read from the file, potentially on its very first call, which would leave in an undetermined state and potentially cause to cause a
null pointer exception. To avoid this, returns the special value (end-of-file) in this situation, which requires an explicit test to avoid ing it: while ((a = read()) != EOF) { write(a); } In contrast, in Icon the function returns a line of text or . is not simply an analog of , as it is explicitly understood by the language to mean "stop processing" or "do the fail case" depending on the context. The equivalent code in Icon is: while a := read() do write(a) This means, "as long as read does not fail, call write, otherwise stop". There is no need to specify a test against the magic number as in the Java example, this is implicit, and the resulting code is simplified. Because success and failure are passed up through the call chain, one can embed function calls within others and they stop when the
nested function call fails. For instance, the code above can be reduced to: while write(read()) In this version, if the call fails, the call fails, and the stops. Icon's branching and looping constructs are all based on the success or failure of the code inside them, not on an arbitrary Boolean test provided by the programmer. performs the block if its "test" returns a value, and performs the block or moves to the next line if it returns . Likewise, continues calling its block until it receives a fail. Icon refers to this concept as
goal-directed execution. It is important to contrast the concept of success and failure with the concept of an
exception; exceptions are unusual situations, not expected outcomes. Fails in Icon are expected outcomes; reaching the end of a file is an expected situation and not an exception. Icon does not have exception handling in the traditional sense, although fail is often used in exception-like situations. For instance, if the file being read does not exist, fails without a special situation being indicated. In traditional language, these "other conditions" have no natural way of being indicated; additional magic numbers may be used, but more typically exception handling is used to "throw" a value. For instance, to handle a missing file in the Java code, one might see: try { while ((a = read()) != EOF) { write(a); } } catch (Exception e) { // something else went wrong, use this catch to exit the loop } This case needs two comparisons: one for EOF and another for all other errors. Since Java does not allow exceptions to be compared as logic elements, as under Icon, the lengthy syntax must be used instead. Try blocks also impose a performance penalty even if no exception is thrown, a
distributed cost that Icon normally avoids. Icon uses this same goal-directed mechanism to perform traditional Boolean tests, although with subtle differences. A simple comparison like does not mean, "if the conditional expression evaluation results in or returns a true value" as they would under most languages; instead, it means something more like, "if the conditional expression succeeds and does not fail". In this case, the operator succeeds if the comparison is true. The calls its clause if the expression succeeds, and either the (if present) or the next line if it fails. The result is similar to the traditional if/then seen in other languages, the performs if is less than . The subtlety is that the same comparison expression can be placed anywhere, for instance: write(a Another difference is that the operator returns its second argument if it succeeds, which in this example will result in the value of being written if it is larger than , otherwise nothing is written. As this is not a test
per se, but an operator that returns a value, they can be strung together allowing things like , a common type of comparison that in most languages must be written as a conjunction of two inequalities like . A key aspect of goal-directed execution is that the program may have to rewind to an earlier state if a procedure fails, a task known as
backtracking. For instance, consider code that sets a variable to a starting location and then performs operations that may change the value - this is common in string scanning operations for instance, which will advance a cursor through the string as it scans. If the procedure fails, it is important that any subsequent reads of that variable return the original state, not the state as it was being internally manipulated. For this task, Icon has the
reversible assignment operator, , and the
reversible exchange, . For instance, consider some code that is attempting to find a pattern string within a larger string: { (i := 10) & (j := (i This code begins by moving to 10, the starting location for the search. However, if the fails, the block will fail as a whole, which results in the value of being left at 10 as an undesirable
side effect. Replacing with indicates that should be reset to its previous value if the block fails. This provides an analog of
atomicity in the execution.
Generators Expressions in Icon may return a single value, for instance, will evaluate and return x if the value of x is less than 5, otherwise it will fail and return no value. Icon also includes the concept of procedures that do not
immediately return success or failure, and instead return new values every time they are called. These are known as
generators, and are a key part of the Icon language. Within the parlance of Icon, the evaluation of an expression or function produces a
result sequence. A result sequence contains all the possible values that can be generated by the expression or function. When the result sequence is exhausted, the expression or function fails. Icon allows any procedure to return a single value or multiple values, controlled using the , and keywords. A procedure that lacks any of these keywords returns , which occurs whenever execution runs to the of a procedure. For instance: procedure f(x) if x > 0 then { return 1 } end Calling will return 1, but calling will return . This can lead to non-obvious behavior, for instance, will output nothing because fails and suspends operation of . Converting a procedure to be a generator uses the keyword, which means "return this value, and when called again, start execution at this point". In this respect it is something like a combination of the Static (keyword)| concept in C and . For instance: procedure ItoJ(i, j) while i creates a generator that returns a series of numbers starting at and ending a , and then returns after that. The stops execution and returns the value of without reseting any of the state. When another call is made to the same function, execution picks up at that point with the previous values. In this case, that causes it to perform , loop back to the start of the while block, and then return the next value and suspend again. This continues until fails, at which point it exits the block and calls . This allows
iterators to be constructed with ease. Another type of generator-builder is the
alternator, which looks and operates like the Boolean operator. For instance: if y This appears to say "if y is smaller than x or 5 then...", but is actually a short-form for a generator that returns values until it falls off the end of the list. The values of the list are "injected" into the operations, in this case, . So in this example, the system first tests y write("y=", (x | 5) > y) Internally, the alternator is not simply an and one can also use it to construct arbitrary lists of values. This can be used to iterate over arbitrary values, like: every i := (1|3|4|5|10|11|23) do write(i) As lists of integers are commonly found in many programming contexts, Icon also includes the keyword to construct
ad hoc integer generators: every k := 1 to 10 do write(k) which can be shortened: every write(1 to 10) Icon is not strongly typed, so the alternator lists can contain different types of items: every i := (1 | "hello" | x This writes 1, "hello" and maybe 5 depending on the value of x. Likewise the
conjunction operator, , is used in a fashion similar to a Boolean operator: every x := ItoJ(0,10) & x % 2 == 0 do write(x) This code calls and returns an initial value of 0 which is assigned to x. It then performs the right-hand side of the conjunction, and since does equal 0, it writes out the value. It then calls the generator again which assigns 1 to x, which fails the right-hand-side and prints nothing. The result is a list of every even integer from 0 to 10. The concept of generators is particularly useful and powerful when used with string operations, and is a major underlying basis for Icon's overall design. Consider the operation found in many languages; this function looks for one string within another and returns an index of its location, or a magic number if it is not found. For instance: s = "All the world's a stage. And all the men and women merely players"; i = indexOf("the", s); write(i); This will scan the string , find the first occurrence of "the", and return that index, in this case 4. The string, however, contains two instances of the string "the", so to return the second example an alternate syntax is used: j = indexOf("the", s, i + 1); write(j); This tells it to scan starting at location 5, so it will not match the first instance we found previously. However, there may not be a second instance of "the" -there may not be a first one either- so the return value from has to be checked against the magic number -1 which is used to indicate no matches. A complete routine that prints out the location of every instance is: s = "All the world's a stage. And all the men and women merely players"; i = indexOf("the", s); while i != -1 { write(i); i = indexOf("the", s, i + 1); } In Icon, the equivalent is a generator, so the same results can be created with a single line: s := "All the world's a stage. And all the men and women merely players" every write(find("the", s)) Of course there are times where one does want to find a string after some point in input, for instance, if scanning a
text file that contains a line number in the first four columns, a space, and then a line of text. Goal-directed execution can be used to skip over the line numbers: every write(5 The position will only be returned if "the" appears after position 5; the comparison will fail otherwise, pass the fail to write, and the write will not occur. The operator is similar to , looping through every item returned by a generator and exiting on failure: every k := i to j do write(someFunction(k)) There is a key difference between and ; re-evaluates the first result until it fails, whereas fetches the next value from a generator. actually injects values into the function in a fashion similar to blocks under
Smalltalk. For instance, the above loop can be re-written this way: every write(someFunction(i to j)) In this case, the values from i to j will be injected into and (potentially) write multiple lines of output.
Collections Icon includes several
collection types including
lists that can also be used as
stacks and
queues,
tables (also known as maps or dictionaries in other languages),
sets and others. Icon refers to these as
structures. Collections are inherent generators and can be easily called using the bang syntax. For instance: lines := [] # create an empty list while line := read() do { # loop reading lines from standard input push(lines, line) # use stack-like syntax to push the line on the list } while line := pop(lines) do { # loop while lines can be popped off the list write(line) # write the line out } Using the fail propagation as seen in earlier examples, we can combine the tests and the loops: lines := [] # create an empty list while push(lines, read()) # push until empty while write(pop(lines)) # write until empty Because the list collection is a generator, this can be further simplified with the bang syntax: lines := [] every push(lines, !&input) every write(!lines) In this case, the bang in causes Icon to return a line of text one by one from the array and finally fail at the end. is a generator-based analog of that reads a line from
standard input, so continues reading lines until the file ends. As Icon is typeless, lists can contain any different types of values: aCat := ["muffins", "tabby", 2002, 8] The items can include other structures. To build larger lists, Icon includes the generator; generates a list containing 10 copies of "word". Like arrays in other languages, Icon allows items to be looked up by position, e.g., .
Array slicing is included, allowing new lists to be created out of the elements of other lists, for instance, produces a new list called aCat that contains "tabby" and 2002. Tables are essentially lists with arbitrary index keys rather than integers: symbols := table(0) symbols["there"] := 1 symbols["here"] := 2 This code creates a table that will use zero as the default value of any unknown key. It then adds two items into the table, with the keys "there" and "here", and values 1 and 2. Sets are also similar to lists but contain only a single member of any given value. Icon includes the to produce the union of two sets, the intersection, and the difference. Icon includes a number of pre-defined "Cset"s, a set containing various characters. There are four standard Csets in Icon, , , , and . New Csets can be made by enclosing a string in single quotes, for instance, .
Strings In Icon, strings are lists of characters. As a list, they are generators and can thus be iterated over using the bang syntax: every write(!"Hello, world!") Will print out each character of the string on a separate line. Substrings can be extracted from a string by using a range specification within brackets. A range specification can return a point to a single character, or a
slice of the string. Strings can be indexed from either the right or the left. Positions within a string are defined to be
between the characters 1A2B3C4 and can be specified from the right −3A−2B−1C0 For example, "Wikipedia"[1] ==> "W" "Wikipedia"[3] ==> "k" "Wikipedia"[0] ==> "a" "Wikipedia"[1:3] ==> "Wi" "Wikipedia"[-2:0] ==> "ia" "Wikipedia"[2+:3] ==> "iki" Where the last example shows using a length instead of an ending position The subscripting specification can be used as a
lvalue within an expression. This can be used to insert strings into another string or delete parts of a string. For example: s := "abc" s[2] := "123" • s now has a value of "a123c" s := "abcdefg" s[3:5] := "ABCD" • s now has a value of "abABCDefg" s := "abcdefg" s[3:5] := "" • s now has a value of "abefg"
String scanning A further simplification for handling strings is the
scanning system, invoked with , which calls functions on a string: s ? write(find("the")) Icon refers to the left-hand-side of the as the
subject, and passes it into string functions. Recall the takes two parameters, the search text as parameter one and the string to search in parameter two. Using the second parameter is implicit and does not have to be specified by the programmer. In the common cases when multiple functions are being called on a single string in sequence, this style can significantly reduce the length of the resulting code and improve clarity. Icon function signatures identify the subject parameter in their definitions so the parameter can be
hoisted in this fashion. The is not simply a form of
syntactic sugar, it also sets up a "string scanning environment" for any following string operations. This is based on two internal variables, and ; is simply a pointer to the original string, while is the current position within it, or cursor. Icon's various string manipulation procedures use these two variables so they do not have to be explicitly supplied by the programmer. For example: s := "this is a string" s ? write("subject=[",&subject,"], pos=[",&pos,"]") would produce: subject=[this is a string], pos=[1] Built-in and user-defined functions can be used to move around within the string being scanned. All of the built-in functions will default to and to allow the scanning syntax to be used. The following code will write all blank-delimited "words" in a string: s := "this is a string" s ? { # Establish string scanning environment while not pos(0) do { # Test for end of string tab(many(' ')) # Skip past any blanks word := tab(upto(' ') | 0) # the next word is up to the next blank -or- the end of the line write(word) # write the word } } There are a number of new functions introduced in this example. returns the current value of . It may not be immediately obvious why one would need this function and not simply use the value of directly; the reason is that is a variable and thus cannot take on the value , which the procedure can. Thus provides a lightweight wrapper on that allows Icon's goal-directed flow control to be easily used without having to provide hand-written Boolean tests against . In this case, the test is "is &pos zero", which, in the odd numbering of Icon's string locations, is the end of the line. If it is
not zero, returns , which is inverted with the and the loop continues. finds one or more examples of the provided Cset parameter starting at the current . In this case, it is looking for space characters, so the result of this function is the location of the first non-space character after . moves to that location, again with a potential in case, for instance, falls off the end of the string. is essentially the reverse of ; it returns the location immediately prior to its provided Cset, which the example then sets the to with another . Alternation is used to also stop at the end of a line. This example can be made more robust through the use of a more appropriate "word breaking" Cset which might include periods, commas and other punctuation, as well as other whitespace characters like tab and non-breaking spaces. That Cset can then be used in and . A more complex example demonstrates the integration of generators and string scanning within the language. procedure main() s := "Mon Dec 8" s ? write(Mdate() | "not a valid date") end • Define a matching function that returns • a string that matches a day month dayofmonth procedure Mdate() # Define some initial values static dates static days initial { days := ["Mon","Tue","Wed","Thr","Fri","Sat","Sun"] months := ["Jan","Feb","Mar","Apr","May","Jun", "Jul","Aug","Sep","Oct","Nov","Dec"] } every suspend (retval ==Criticisms==