3.13 Type Inferrer

You must complete this assignment using a specific language. Choose Determine language from source in the DrRacket menu, then write
#lang plai
as the first line of your program, overwritting the default #lang at the top of the file.

The grammar for the language:

expr	=	num
	\|	true
	\|	false
	\|	(+ expr expr)
	\|	(- expr expr)
	\|	(* expr expr)
	\|	(iszero expr)
	\|	(bif expr expr expr)
	\|	id
	\|	(with (id expr) expr)
	\|	(rec (id expr) expr)
	\|	(fun (id) expr)
	\|	(expr expr)
	\|	tempty
	\|	(tcons expr expr)
	\|	(tempty? expr)
	\|	(tfirst expr)
	\|	(trest expr)

where num is a Racket number and id is an identifier not otherwise mentioned in the grammar.

Warning: Many students incorrectly parse Racket booleans (#t, true, #f, etc) rather than the symbols 'true and 'false for the booleans of this language.

There are no type expressions this time. Your completed program will infer types for all expressions, including functions.

3.13.1 Template

You must use the following template:

(define-type Expr
  [num (n number?)]
  [id (v symbol?)]
  [bool (b boolean?)]
  [bin-num-op (op procedure?) (lhs Expr?) (rhs Expr?)]
  [iszero (e Expr?)]
  [bif (test Expr?) (then Expr?) (else Expr?)]
  [with (bound-id symbol?) (bound-body Expr?) (body Expr?)]
  [rec-with (bound-id symbol?) (bound-body Expr?) (body Expr?)]
  [fun (arg-id symbol?) (body Expr?)]
  [app (fun-expr Expr?) (arg-expr Expr?)]
  [tempty]
  [tcons (first Expr?) (rest Expr?)]
  [tfirst (e Expr?)]
  [trest (e Expr?)]
  [istempty (e Expr?)])

(define-type Type
  [t-num]
  [t-bool]
  [t-list (elem Type?)]
  [t-fun (arg Type?) (result Type?)]
  [t-var (v symbol?)])

(define-type Constraint
  [eqc (lhs Type?) (rhs Type?)])

3.13.2 Parsing

Write the following function:

procedure
(parse se) → Expr?
se : s-expression?

Parses expressions from the grammar.

You may assume that the s-expression supplied to parse are valid.

3.13.3 Alpha Renaming

Write the following function:

procedure
(alpha-vary e) → Expr?
e : Expr?

Renames all the identifiers in e to new unique identifier, such that the same identifier is never used twice.

You will find make-immutable-hasheq, hash-ref, hash-set, and gensym useful for this part.

The function gensym returns a unique symbol each time it is applied. (gensym accepts an optional symbol as an argument; the result of (gensym 'x) is then a unique symbol that “looks like” 'x.)

Examples:

> (gensym)
'g2568
> (gensym 'x)
'x2569

If we were to call alpha-vary on (parse '(+ (with (x 4) x) (with (x 5) x))), it should return something like (parse '(+ (with (x1 4) x1) (with (x2 5) x2))).

If alpha-vary is given something that refers to unbound identifiers, it should error. For example: (alpha-vary (parse '(with (x 5) y))).

3.13.4 Constraint Generation

Write the following function:

procedure
(generate-constraints e-id e) → (listof Constraint?)
e-id : symbol?
e : Expr?

Returns the constraints generated by e. e-id serves as e’s label in this list.

A few things to note:

List operations are polymorphic; that is, you can create lists of numbers, booleans, lists of functions from numbers to booleans, etc.
The bound body of a rec binding does not have to be a syntactic function. However, you may assume that the rec-bound identifier only appears under a fun in the bound body. In other words, the following expressions are legal:
(rec (f (fun (x) (f x)))
  ....)
(rec (f (with (y 4)
              (fun (x) (f y))))
  ....)
while the following are not legal:
(rec (f f)
  ....)
(rec (f (+ 1 f))
  ....)
During constraint generation, you will need to generate fresh identifiers for labels. Use gensym. For easier debugging, you should pass an additional argument so you can tell what kind of expression the label refers to; for example: (gensym 'app-fun-pos) for the expression in the function position of an application.
Use constraint-list=? to test this.

3.13.5 Unification

Write the following function:

procedure
(unify loc) → (listof Constraint?)
loc : (listof Constraint?)

Implements the unification algorithm from the textbook. The list of constraints that is returned should only have t-vars on the left.

unify should signal an error if two types cannot be unified or if the occurs check fails.

I suggest writing very small examples to test the various cases of the algorithm.

3.13.6 Inferring Types

Write the following function:

procedure
(infer-type e) → Type?
e : Expr?

This should alpha-vary e, generate-constraints on it, run the unify algorithm, and finally extract the type from the substitution returned by unify.

This function (i.e. its tests) accounts for 65% of your grade. The other functions are each 5% and you should test them thoroughly (i.e. on a per-form basis.)

3.13.7 Testing Type Equality

infer-type returns a Type, so you will need to test types for equality. This can be tricky due to type variables, particularly because generate-constraints calls gensym to generate unique type variables.

Use the following support code to test if two types are equal modulo identifier renaming:

; type=?/mapping : hash hash Type Type -> Bool
; determines if types are equal modulo renaming
(define (type=?/mapping ht1 ht2 t1 t2)
  (define (teq? t1 t2)
    (type=?/mapping ht1 ht2 t1 t2))
  (cond
    [(and (t-num? t1) (t-num? t2)) true]
    [(and (t-bool? t1) (t-bool? t2)) true]
    [(and (t-list? t1) (t-list? t2))
     (teq? (t-list-elem t1) (t-list-elem t2))]
    [(and (t-fun? t1) (t-fun? t2))
     (and (teq? (t-fun-arg t1) (t-fun-arg t2))
          (teq? (t-fun-result t1) (t-fun-result t2)))]
    [(and (t-var? t1) (t-var? t2))
     (local ([define v1 ; the symbol that ht1 says that t1 maps to
               (hash-ref
                ht1 (t-var-v t1)
                (lambda ()
                  ; if t1 doesn't map to anything, it's the first
                  ; time we're seeing it, so map it to t2
                  (hash-set! ht1 (t-var-v t1) (t-var-v t2))
                  (t-var-v t2)))]
             [define v2
               (hash-ref
                ht2 (t-var-v t2)
                (lambda ()
                  (hash-set! ht2 (t-var-v t2) (t-var-v t1))
                  (t-var-v t1)))])
       ; we have to check both mappings, so that distinct variables
       ; are kept distinct. i.e. a -> b should not be isomorphic to
       ; c -> c under the one-way mapping a => c, b => c.
       (and (symbol=? (t-var-v t2) v1)
            (symbol=? (t-var-v t1) v2)))]
    [(and (Type? t1) (Type? t2)) false]
    [else (error 'type=? "either ~a or ~a is not a Type" t1 t2)]))

; type=? Type -> Type -> Bool
; signals an error if arguments are not variants of Type
(define ((type=? t1) t2)
  (or (type=?/mapping (make-hash) (make-hash) t1 t2)
      ; Unfortunately, test/pred simply prints false;
      ; this helps us see what t2 was.
      (error 'type=?
             "~s and ~a are not equal (modulo renaming)"
             t1 t2)))

(test/pred (t-var 'a)
           (type=? (t-var 'b)))
(test/pred (t-fun (t-var 'a) (t-var 'b))
           (type=? (t-fun (t-var (gensym)) (t-var (gensym)))))
(test/pred (t-fun (t-var 'a) (t-var 'b))
           (type=? (t-fun (t-var (gensym)) (t-var (gensym)))))
(test/pred (t-fun (t-var 'a) (t-var 'a)) ; fails
           (type=? (t-fun (t-var (gensym)) (t-var (gensym)))))
(test/pred (t-fun (t-var 'a) (t-var 'b)) ; fails
           (type=? (t-fun (t-var 'c) (t-var 'c))))
(test/exn ((type=? 34) 34) "not a Type")

; constraint-list=? : Constraint list -> Constraint list -> Bool
; signals an error if arguments are not variants of Constraint
(define ((constraint-list=? lc1) lc2)
  (define htlc1 (make-hash))
  (define htlc2 (make-hash))
  (or (andmap (lambda (c1 c2)
                (and
                 (type=?/mapping
                  htlc1 htlc2
                  (eqc-lhs c1) (eqc-lhs c2))
                 (type=?/mapping
                  htlc1 htlc2
                  (eqc-rhs c1) (eqc-rhs c2))))
              lc1 lc2)
      (error 'constraint-list=?
             "~s and ~a are not equal (modulo renaming)"
             lc1 lc2)))

3.13.8 Testing Hints

You may find Racket’s quasiquoting useful. For example, you could define

(define add1-fun '(fun (x) (+ x 1)))

to use in multiple test cases, such as

(test/pred (infer-type (parse add1-fun))
(type=? (t-fun (t-num) (t-num))))
(test/pred (infer-type (parse `(,add1-fun 1))) (type=? (t-num)))
(test/exn (infer-type (parse `(,add1-fun false))) "unify")

This becomes more useful as functions get more complex, like curried map. Yes, your test cases should get this elaborate, as well as covering the simple cases.

Make sure you have test cases for the occurs check [PLAI 282].

3.13.9 Extra Credit

For an amount of extra credit equal to a “special” test case, write a program in this language for which your algorithm infers the type

(t-fun (t-var 'a) (t-var 'b))

You shouldn’t attempt this problem until you’ve fully completed the assignment. Put "extra credit" in a comment above the test case to make it easy to spot.

3.13.10 What Not To Do

You do not need to implement an interpreter for this language.

You do not need to implement let-based polymorphism.

← prev up next →

1	Introduction
2	Lectures
3	Assignments
4	Turn-In and Grading
5	Help
6	Readings
7	Software
8	Fine Print

3.1	Basic Racket
3.2	Lists and Trees
3.3	Higher-order Functions
3.4	Rudimentary Interpreter
3.5	Extended Interpreter
3.6	Language Analysis the First
3.7	Game Survey
3.8	Analysis: Garbage Collection
3.9	Garbage Collectors
3.10	Web Game Survey
3.11	Analysis: Continuations
3.12	Type Checker
3.13	Type Inferrer
3.14	Analysis: Type Systems
3.15	Laziness and Infinite Data
3.16	Language Analysis the Second
3.17	Analysis: Laziness
3.18	Prolog

3.13.1	Template
3.13.2	Parsing
3.13.3	Alpha Renaming
3.13.4	Constraint Generation
3.13.5	Unification
3.13.6	Inferring Types
3.13.7	Testing Type Equality
3.13.8	Testing Hints
3.13.9	Extra Credit
3.13.10	What Not To Do