In this post, I explain a trick for avoiding duplication of types between
.ml
and.mli
files that will be familiar to anyone who's worked with Jane Street codebases.
The problem
OCaml compilation units live a double life: one as source code (foo.ml
) and
one as header information (foo.mli
).
This works well in encouraging abstraction, so you'll often see less type
information in the .mli
than in the .ml
, but any types that are not
abstracted are duplicated. This is a big deal for functor-heavy projects, since
large module types will end up being duplicated across the two files.
There's a couple of standard mitigations for this:
move all of your types into a single file with no corresponding
.mli
.Each
foo.{ml,mli}
file can now alias the types and module types from a centrals.ml
ortypes.ml
file. Unfortunately, all those types are now defined separately from their point-of-use, making your codebase harder to understand and less scalable.minimise the number of module types being defined.
Since we're paying twice for each module type we define, it's natural to want to define as few of them as possible. For instance, we might avoid defining a
MAKER
type for ourMake
functor and just keep the constraints in the.mli
file instead. Unfortunately, this hides the constraints from Merlin, so you won't discover any discrepancies until compile time:
(* --- foo.mli -------------------------------------------------------------- *)
module Make (A: Arg.S) : S with type arg = A.t
(* --- foo.ml --------------------------------------------------------------- *)
module Make (A : Arg.S) = struct
(* We must define [type arg = A.t], but Merlin doesn't know this *)
type arg = string
end
Both of these mitigations have their drawbacks. If only our foo.ml
could refer
to the module types defined in foo.mli
. Hmm...
The solution (?)
As with most problems, we can solve this with another layer of
indirection. We add a third file, named foo_intf.ml
. This file holds types and
signatures, so is like our old foo.mli
file, but has the distinct advantage
that foo.ml
can pull types from it:
Now our types are defined in exactly one place, with no unnecessary duplication.
The foo_intf.ml
file contains all of the types required by foo.ml
and also
defines a special module type Intf
to act as the public interface.
(* --- foo_intf.ml ---------------------------------------------------------- *)
(* Type definitions go here: *)
module type S = sig ... end
module type MAKER = functor (A: Arg.S) -> S with type arg = A.t
type error = [ `Msg of string | `Code of int ]
(* The interface of [foo.ml]: *)
module type Intf = sig
type error
(** [error] is the type of errors returned by {!S}. *)
module type S = S
module type MAKER = MAKER
module Make : MAKER
end
(* --- foo.ml --------------------------------------------------------------- *)
(* Fetch module types and type definitions from the [_intf] file *)
include Foo_intf
(* Implementation here as normal *)
module Make : MAKER = functor (A : Arg.S) -> struct ... end
(* --- foo.mli -------------------------------------------------------------- *)
include Foo_intf.Intf (** @inline *)
There are some nice advantages to this approach:
- We've avoided duplicate definitions of
foo
's module types and kept them in thefoo*
namespace in our source tree. The code is now easier to change1 and easier to understand.
- Since we no longer have to minimise our use of module types, we can give the
types of functors at the point of definition (
module Make : MAKER = ...
). This style works better with Merlin.
The _intf
style is commonly used in Jane Street packages (c.f.
higher_kinded
, base
, core
). Note that
it's typically only used for files that export module types, for which this
trick is most effective.
I hope you find this technique useful in making your OCaml code more concise and less frustrating to work with.
Appendix A: hiding _intf
from Odoc
Use of the _intf
trick is an implementation detail that (ideally) shouldn't be
exposed in your documentation. At time of writing, Odoc renders all include
s
and type aliases with links to the source definition. In the case of include
d
module types, you can use the @inline
annotation to prevent Odoc from
displaying the indirection:
include Foo_intf.Intf (** @inline *)
Unfortunately: (a) there's no equivalent trick for plain type definitions, and (b) any cross-references between module types will link to the true definition. This leaves you with rendered output like the following:
module Make : functor (Input : Foo__.Foo_intf.INPUT) -> S
where INPUT
is defined in the Foo_intf
file but accessible to the user as
Foo.INPUT
(via an alias).
Fortunately, the new Odoc model solves
this problem by generating links to "canonical" definitions of types, which are
never taken from hidden modules (those with double underscores like Foo__
). At
time of writing, this new model hasn't yet been released.
Appendix B: faster file bootstrapping
An interesting side-effect of being able to reference interfaces from
implementations is that you can use them to kick-start initial development on a
file. If your development process begins by defining signatures, the .ml
+
.mli
workflow requires a secondary step of "add stub implementations of
everything" to sneak past the type-checker:
(* --- stack.mli ------------------------------------------------------------ *)
type 'a t
val empty : 'a t
val push : 'a t -> 'a -> 'a t
val pop : 'a t -> ('a t * 'a) option
(* --- stack.ml ------------------------------------------------------------- *)
type 'a t
let empty = failwith "TODO"
let push = failwith "TODO"
let pop = failwith "TODO"
With an _intf
file, we can provide all of these stubs in one go:
(* --- stack.ml ------------------------------------------------------------- *)
include (val (failwith "TODO") : Stack_intf.Intf)
(I learned about this trick from a blog post by Carl Eastlund.)
Appendix C: teaching Emacs about _intf
files
Emacs users with tuareg-mode
can use tuareg-find-alternate-file
to quickly
jump between corresponding .ml
and .mli
files. If you use this feature (as I
do), you'll want it to be aware of _intf
files. This can be done by
customising the tuareg-find-alternate-file
variable to include the
correspondence <foo>.ml
↔ <foo>_intf.ml
:
;; Add support for `foo_intf.ml' ↔ `foo.ml' in tuareg-find-alternate-file
(custom-set-variables
'(tuareg-other-file-alist
(quote
(("\\.mli\\'" (".ml" ".mll" ".mly"))
("_intf.ml\\'" (".ml"))
("\\.ml\\'" (".mli" "_intf.ml"))
("\\.mll\\'" (".mli"))
("\\.mly\\'" (".mli"))
("\\.eliomi\\'" (".eliom"))
("\\.eliom\\'" (".eliomi"))))))
If you're currently looking at some file foo.ml
, tuareg-find-alternate-file
will try to open foo_intf.ml
and then foo.mli
in that order. (If one of the
two already has an open buffer, that will take priority.)
Changelog
- 2020-06-10: changed the recommended name of the interface
module type
fromFoo_intf.Foo
toFoo_intf.Intf
. In the time since I originally wrote this post, I've come to dislike the duplication of the module name using the Jane Street convention: in practice,Foo
is often quite long and subjected to later renaming.
- via reducing the repetition viscosity of our notation for types.↩