Lisp Types Demystified (Part I: The Span of Common Lisp Types)

A friend of mine sent me an email today about types in programming languages, and discussed how types can alleviate certain kinds of errors. One interesting part of the email was on two orthogonal points of type systems:

I now realise that there are several issues conflated into a false dichotomy of static/dynamic.

  1. type expressivity (ML, Haskell, and Scala even more so)
  2. type checking (same family)

Traditional C and Java family languages fail completely on the first point. They have primitive type systems. Dynamic languages (PHP, Python, Ruby, Scheme, etc) obviously fail on the second point. So that covers about 90 or 95% of what most people are using.

If you address both these points, you have a very high level of expressive power (1) and safety (2).

I was thinking about how Common Lisp fits the above two points.

I really wish I could say Lisp takes a solid stance in the middle ground. Cursory inspection tells us Lisp has the following types:

  • primitive atomic types: integer (any size), fixnum (machine size), single-float, double-float, string, base-char, pathname, …
  • (parametric) compound types: (cons $\tau$ $\sigma$), (array $\tau$), (vector $\tau$), (function ($\tau_1$ $\ldots$ $\tau_n$) $\sigma$) (in functional languages, the type $\tau_1\times\cdots\times\tau_n\to\sigma$), …any more?

and perhaps most interestingly

  • simple dependent types:
    • (integer $a$ $b$): integers $x$ satisfying $a\le x\le b$, also valid for other numeric types like single-float,
    • (array $\tau$ ($\ell_1$ $\ell_2$ $\ldots$ $\ell_n$)): $n$-dimensional array of elements in $\tau$ with dimension $\ell_1\times \cdots \times \ell_n$, and
    • (vector $\tau$ $\ell$): vector with elements in $\tau$ of length $\ell$.

All of this looks great at the surface… but what can we create as a programmer? What can we do in terms of type abstraction?

Simple Data Types

We can create simple data types easily. Just use defstruct (or defclass). For example

(defstruct my-string 
   characters)

will create a new type called my-string.

Creating simple, opaque data types is easy in Lisp, except for creating opaque “enumerations”. Usually symbols are used for that purpose. For example, in Standard ML, to represent the suits of cards, we’d have:

datatype Suit = Hearts | Spades | Diamonds | Clubs

and this is its own unique type. In Lisp, we would just use the symbols hearts, spades, diamonds, and clubs, and at best, we’d define our own type for it:


(deftype suit ()
  `(member hearts spades diamonds clubs))

This, however, isn’t an opaque data type. Each of these may be confused for the type symbol.

Another option is to create a “quadrupleton” structure (a “4-analog” to singleton):

(defstruct suit kind)

and then define four of its elements

(defvar +hearts+   (make-suit :kind 'hearts))
(defvar +spades+   (make-suit :kind 'spades))
(defvar +diamonds+ (make-suit :kind 'diamonds))
(defvar +clubs+    (make-suit :kind 'clubs))

and so forth. This isn’t a very good way, because there’s no enforced rule banning programmers from making other kinds of suit kinds. (On a very related note, why can’t we use defconstant which would be more appropriate conceptually? Well, make-suit generates non-eql values. It is not generally true that

(eql (make-suit :kind 'hearts)
     (make-suit :kind 'hearts))

and therefore can’t be considered “constant” in Lisp.)

Anyway, excluding “$n$-tupleton” types (which I’ll refer to as the singleton problem) creating new simple (opaque) data types is easy, and doesn’t produce fruitful discussion really.

Compound Data Types

I consider compound data types to essentially be types which are in some way equivalent to a mathematical tuple. In Lisp, we get tuples via structures. Given $n$ concrete types $\tau_1$, …, $\tau_n$, we can create a tuple type $S = \tau_1\times\cdots\times\tau_n$ with projections $\pi_k : S \to \tau_k$ via the following structure:

(defstruct (S (:conc-name nil))
  (pi1 x1 :type tau1)
  (pi2 x2 :type tau2)
  ...
  (piN xN :type tauN))

where $x_1,\ldots,x_n$ are initial values for each of the slots in the structure. (Actually, in Lisp, we needn’t provide initial values in the defstruct itself. We can, for example, replace x1 with

(error "Must provide value for slot PI1.")

so that when we call make-S, we must provide a value for the pi1 slot.

This is a relatively straightforward construction, provided the types $\tau_k$ are already-defined types.

Parametric Data Types

Well, Lisp has no parametric polymorphism really. That means we can’t define our own parametric compound types (edit: see end of post). This is easy in SML or Haskell or Scala. We can’t define a tree with typed nodes in Lisp, for example, with arbitrary (but specific) types for the nodes. The best we can do is something like this to make a tree whose nodes are of type S

(defstruct tree-S
  (left  nil :type (or null tree-S))
  (right nil :type (or null tree-S))
  (node  nil :type (or null S)))

But this isn’t parametric. We have to specify S at compile time. This isn’t re-usable for other types: we’d have to make a tree for each type we want if we want to be sure the compiler could make use of the type information. Most don’t bother, and just create a tree that accepts any type for their nodes.

How could we write a Maybe type? To remind, in SML, it can be defined as so:

datatype 's Maybe = Just of 's
                  | Nothing

Again, this is a parametric type, so it’s effectively impossible in Lisp. Can we even do it if we give up the type polymorphism? Kind of…

;; define an abstract "type"
(defclass maybe () ())

;; define the Just branch, where VALUE is the slot which
;; can hold any type.
(defclass just (maybe)
  (value))

;; define the Nothing branch.
(defclass nothing (maybe) ())

We exploit inheritance to get what we want. This “pattern” can be extended for really all kinds of algebraic data types… minus the actual compile-time type polymorphism! Moreover, this is slow and usually not a good idea. Above, nothing is a class which could be instantiated more than once, and each instantiation would be different (non-eq; different pointers). So we’d really want a singleton class there (the singleton problem). But even so, using CLOS to emulate algebraic data types is like stapling a few pages together with 10 inch galvanized nails with a sledgehammer. And what benefit do we get? We’d need to do some wizardry to get pattern matching and other benefits. (CL-MONAD is a library which takes this approach, and also defines pattern matching macros, and monads. (Though, monads are really only useful when you have tail recursion, compile-time types, etc., but that’s a story for another post.))

Quite simply, we don’t get the same benefits as in a statically typed language, and so no one really employs this pattern. (In particular, in lieu of a maybe/option type, Lispers use (or null $\tau$), that is, return nil to represent the Nothing branch of the ADT, and return just the value itself for the Just branch. What do we do when we want $\tau=$ null?)

In sum, we can’t really define our own algebraic data types, or even our own compound recursive data types, in a useful fashion.

A Note on Recursive Structures

We actually sort of can define compound recursive types, but only very simple ones. We can’t using deftype; recursive deftypes can’t be expanded. And they are hacky.

The best we can do to replicate the traditional list data type is by using self-referential structures:

(defconstant knil 'knil)

(defstruct (kons (:conc-name nil)
                 (:constructor kons (kar kdr))
  (kar 0    :type t)
  (kdr knil :type (or (member knil) kons)))

(deftype liszt ()
  `(or (member knil)
       kons))

but notice we don’t get a compound type specifier (we have type liszt, not (liszt $\tau$)), and notice how cumbersome it is to define. We can use it as normal (and it’s possible to define special pretty printers…):

> (kons 1 (kons 2 knil))
#S(KONS :KAR 1 :KDR #S(KONS :KAR 2 :KDR KNIL))

> (typep * 'liszt)
T

> (kons 1 (kons 2 nil))
ERROR: The value NIL is not of type (OR (MEMBER KNIL) KONS).

The last thing to notice is that knil can be conflated for a symbol (the singleton problem, again):

> (symbolp knil)
T

Dependent Types

We can’t really define our own new dependent types. For example, we can’t define a type (tree $n$) which specifies a tree of depth $n$. We can only define new types which use already-defined dependent types as a foundation. For example, we could define a type square-matrix as follows:

(deftype square-matrix (type n)
  `(array ,type (,n ,n)))

And then we can use it:

> (make-array '(2 2) :initial-contents '((1 2) (3 4)))
#2A((1 2) (3 4))

> (typep * '(square-matrix integer 2))
T

The thing to notice is that we just re-used an existing dependent type, and didn’t create a truly new one.

Conclusion

So, to requote two orthogonal attributes of type systems:

  1. type expressivity (ML, Haskell, and Scala even more so)
  2. type checking (same family)

Lisp clearly has little type expressivity. The best we can do is define our own simple types (perhaps holding compound data), and very hackily define very simple recursive types. We can define other kinds of types which is more expressive than a lot of type systems, e.g., a type for even numbers is

(deftype even-integer ()
  `(and integer (satisfies evenp)))

but satisfies is limited in that it must refer to a global function, which means we can’t make our own dependent types for example. However, we don’t really get much benefit except for the fact a compiler can verify a particular value is an even integer.

For a dynamically typed language, Lisp’s type checking is very good. It is able to type check all of the built in types. It can do pretty advanced analysis of value ranges for example, or even construct specialized arrays. Compiler warnings (from e.g. SBCL and CMUCL) are very good in this respect. But then again: what’s the value in type checking if we can barely abstract our programs with our own opaque, checkable types?

Parameterized Types (Edit 6 Sept 2012)

jasom came up with a solution to implementing a crude version of parametric types in Lisp, but he says

jasom: just because you said it’s not possible, not because I think it’s a good idea

Given a structure such as

(defstruct mytree
  left
  right
  data)

we can evaluate — at macroexpansion time — a defun which creates a name we can use in a satisfies clause:

(deftype tree (datatype)
  (let ((satname (intern (prin1-to-string datatype) :mytree-types)))
    (eval `(defun ,satname (x)
             (or (null x)
                 (and (,satname (mytree-left x))
                      (,satname (mytree-right x))
                      (or (null (mytree-data x))
                          (typep (mytree-data x) (quote ,datatype)))))))
    `(and mytree
          (satisfies ,satname))))

This assumes we have a package mytree-types which would hold all of the newly defined type predicates.

This is, however, not a good idea in practice. Aside from polluting some namespace (which is, in practice, alleviated by creating our own package), this will define new functions at each macro expansion. What this allows is the compiler to do typep checks, and would likely impede on the compiler’s ability to optimize the code.

8 comments to Lisp Types Demystified (Part I: The Span of Common Lisp Types)

  • Samum Gromoff

    Are you familiar with Shen (formerly known as Qi), and if so, what do you think of it?

    It is supposed to have a much more expressive type system.

  • Robert Goldman

    Can’t we make your parametric data types using macros? Those would allow us to supply the type parameter at runtime…

    As you say, though, I’m not sure what we would win by doing this, so it’s not a big deal….

    Is there a more exotic solution using the MOP?

    • We could generate new kinds of structs/classes with a macro, essentially creating a C++ template-style kind of thing. That is, we could do

      (define-tree-type (some-type)
         `(defstruct ,(symbolicate 'tree- some-type)
            ....))
      

      But we lose more than we gain. We don’t have some overarching type called tree that we can use within our algorithms, which is the point of real parametric types. Our algorithms/functions which make use of this type must be specified statically, and can’t be chosen dynamically with e.g. type inference.

      I don’t think the MOP provides anything useful here.

  • lisp-interface-library offers a nice (IMHO) way to combine CLOSsy ad-hoc polymorphism and Haskell-style parametric polymorphism. It can express dependent types. It doesn’t have a good story for sum types (yet), but if anyone has one, it can probably be amended to do it.

  • How about:

    (deftype maybe (s)
      `(or (cons (member :just) ,s)
           (cons (member :nothing) null)))
    
    (typep '(:just . 3) '(maybe integer)) => T
    (typep '(:nothing)  '(maybe integer)) => T
    (typep '(:just . 3) '(maybe *))       => T
    (typep '(:nothing)  '(maybe *))       => T
    (typep '(:just . 3) '(maybe string))  => NIL
    (typep '(:nothing)  '(maybe string))  => T
    

    • This isn’t bad and it’s pretty lispy, but it’s still an alias, akin to macro expansion, and therefore still is conflated with plain old conses.

      Also, I’d probably do:

      (deftype maybe (s)
        `(or (cons (member :just) ,s)
             (member :nothing)))
      

      :)

  • Of course, this is better: (deftype maybe (&optional (s '*)) ...)

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Before you post, please prove you are sentient.

what is 7 + 9?