Scripting Languages

Introduction

What's the difference between "scripting languages" (like Perl or Tcl) and "general purpose languages" (like C++, Java, or LISP)? For that matter, how do you define the terms "scripting language" and "general purpose language"?

Most people who address this topic [Ousterhout] say things like:

But these are missing the point - they're missing the real differences between current general purpose and scripting languages.

Terminology

For purposes of this article, I'll define scripting language simply as any programming language used to write scripts. Scripts are short programs, usually written quickly to solve some task at hand.

You might notice that those definitions are awfully vague. That's because there isn't actually a hard boundary between scripts on the one hand, and "real programs" or "applications", on the other. In reality, there is a continuum. Many scripts start out small and grow up to become full-fledged applications. Developers have been known to write large programs using what most programmers would consider to be a scripting language. Stallman points out that extensions (which are sort of like scripts) can often be "large, complex programs in their own right" [TclWar].

In summary, I think it's more interesting to look at the features that are unique to languages generally considered to be scripting languages (which I will shorten to simply "scripting languages"), and also features peculiar to general purpose languages.

Scripting Language Features

What is it that makes it easier and faster to write code in a scripting language? There are are several features involved. A language which has most (or even many) of these features will generally "feel like" a scripting language. A language which has few (or none) of them will feel like a general purpose language.

Interpreter. An interpreter makes it fast and easy to run a program. Just type "perl foo.pl" and off it goes. Compare this to the typical compiler scenario. First, you have to compile everything: "gcc -g -O -Wall -c foo.c" (repeat for each module). Then link the program: "gcc -o foo foo.o ... -l...". And finally, run it: "foo".

Native Complex Types. Scripting languages tend to provide native string, list, and dictionary (aka hash or a-list) types. These types can be implemented in pretty much any language - the point here is that scripting languages provide good native implementations. You should be able to create a string, list, or dictionary with a minimum of extra syntax. For example, "(1, 2, 3)" creates a list (note: no explicit function calls). You should also be able to concatenate two strings or two lists with a builtin operator (again, with no explicit function calls).

Garbage Collection. Garbage collection relieves the programmer of the duty of keeping track of which objects need to be freed. It also makes memory leaks less likely (certain types of memory leaks anyway; and to be fair, it creates the potential for some nasty new kinds of memory leaks).

This table lists four general purpose languages on the left (C, C++, Java, Common Lisp) and two scripting languages on the right (Tcl, Perl). The rows are the features described above, and the table entries are "yes" or "no", indicating whether each language has that feature. As you'd expect, the scripting languages have lots of "yes"es, the general purpose languages have lots of "no"s.

C C++ Java Common Lisp Tcl Perl
Interpreter no no no no yes yes
Native Complex Types no no no yes no yes
Garbage Collection no no yes yes yes yes

Some notes on the table:

General Purpose Language Features

There are several interesting programming language features missing from that list. These are things which are usually associated with general purpose languages.

Compiler. Compilers make code run faster (all else being equal, i.e., given that we're comparing a compiler and interpreter for the same language). They also allow you to generate standalone executables.

Structs. Structs (aka objects or records) are a generic building block for user-defined types. Languages without structs generally end up forcing the programmer to make everything look like a list (or a string, or whatever the language happens to provide).

Compile-Time Type Checking. A language that requires typed variables and does a certain amount of compile-time checking will catch programmer errors earlier, and save on debug time. Explicit types also make it easier to implement polymorphic functions. ("Explicit" is something of a fuzzy concept here - SML does compile-time type checking without requiring variables to be typed. The important thing here is the checking.)

Here is the continuation of the previous table, listing the general purpose language features. This time, the "yes"es are on the left side of the table.

C C++ Java Common Lisp Tcl Perl
Compiler yes yes yes yes no no
Structs yes yes yes yes no no
Compile-Time Type Checking yes yes yes no no no

Why Not Take Them All?

There's absolutely no reason that all of the features listed here - the so-called scripting language features, as well as the general purpose language features - can't be incorporated into one programming language. Such a language could (with the appropriate development and runtime tools) be used as both a scripting language and a general purpose language. In fact, this hypothetical language could beat out existing languages (in both categories) for many tasks.

Note that all of the features discussed here are orthogonal to various other debates. For example, both imperative and functional languages could be designed with all of these features - and this wouldn't change the relative advantages and disadvantages that functional languages have compared to imperative languages.

The hypothetical language would have both a compiler and an interpreter. The interpreter would serve to run shorter programs ("scripts") as well as to do quick testing of larger programs. The compiler would be used to generate higher-performance, standalone executables. As an aside, the compiler would ideally incorporate some features currently more common in interpreters:

Native complex types would be just as useful in general purpose languages as they are in scripting languages. C++ programmers end up using awkward STL types for lists and dictionaries. It wouldn't be that hard to incorporate these into the language in a syntactically clean way. (Various general purpose languages already get this mostly right - Lisp and SML come to mind here.)

Structs (records, objects, whatever you want to call them) would be very useful in a scripting language. Perl has made a kludgey attempt to add them with its pseudohashes and the fields pragma. As has been pointed out before, one should never design a language on the assumption that people will only write short scripts in it [TclWar]. Any language that becomes at all useful (whether because it's a good language or because it's embedded in an otherwise good application) will get used to write ever larger programs.

Garbage collection is useful. There really shouldn't be argument over this anymore. (There are clearly some applications where GC cannot or should not be used - these applications should be able to disable it, or perhaps use a more appropriate language.) Even the performance issue can be solved [Appel].

Compile-time type checking is also pretty clearly a good thing. In general purpose languages, it's generally agreed that type checking catches bugs earlier and ends up saving debug time. Why would anyone not want this for a scripting language too? Granted, it adds a few seconds to the time it takes to write a function, but the payoff in debug time is well worth it.

Conclusion

Someone needs to design a language that gets all of these things right. I haven't seen anyone attempt this yet. Even fairly recent languages like Java and C# don't address all of these issues. (For example, they provide garbage collection, but don't do so well with complex types or an interpreter.)

References

[Appel] Andrew W. Appel, "Garbage Collection Can Be Faster Than Stack Allocation", Information Processing Letters 25(4):275-279, 17 June 1987.
http://www.cs.princeton.edu/~appel/papers/45.ps

[Ousterhout] John K. Ousterhout, "Scripting: Higher Level Programming for the 21st Century", IEEE Computer, March 1998.
http://www.tcl.tk/doc/scripting.html

[TclWar] Richard Stallman, "Why you should not use Tcl" (and various followup posts), September 1994.
http://www.vanderburg.org/Tcl/war/0000.html