🦊 The Phox Programming Language

Phox is a small functional programming language.

It aims to be a simple yet expressive tool - your clever companion for exploring type theory and practical programming.

✨ Features

Hindley–Milner type inference
No need to annotate types in most cases.
(Phox includes a minimal type-annotation syntax for expressions, used only when disambiguation is needed.)
Algebraic data types (ADT)
Define expressive data structures with variants.
Pattern matching
Concise and powerful destructuring.
Newtype shorthand
Cleaner syntax for single-constructor wrapper types.
First-class functions
Functions are values, operators are functions too.
Generic function temlates
Non-first class / overloadable generic function templates.
Multi-parameter typeclasses (trait/impl)
Define the relationship between multiple methods and multiple types.
Trait record
Typeclasses (trait/impl) can be instansiated as a first-class record value, with a simple syntax.
Module system
Rust like module / namespace definition.
Simple syntax
Inspired by ML-family languages, with a focus on clarity.

Install Phox

⚠️ Work in progress — Phox is under active development.

Build

Clone the GitHub repository and build with Rust (tested on 1.80+, may work on earlier versions):

git clone https://github.com/mori0091/phox.git
cd phox
cargo build

Run in REPL

If you run without arguments, Phox starts an interactive REPL:

cargo run
> let rec fact = λn. if (0 == n) 1 else n * fact (n - 1);
> fact 5
=> 120: Int

In REPL, an input starts with : is recognized as a REPL command.
For example, :? shows the list of available commands:

cargo run
> :?

:quit, :q
    exit REPL.

:help, :h, or :?
    print this help messages.

:load <path>, :l <path>
    load and evaluate Phox source file specified by <path>.

:modules, :m
    print list of root modules.

:using, :u
    print list of using aliases for each modules.

:symbols, :s
    print list of symbols for each modules.

:impls, or :impls [options]
    print list of `impl`s.
    options:
        -v, or --verbose
            print also the `impl`s' implementation.

Run a program file

Pass a .phx file to execute it:

.phx is the conventional extension for Phox source files (plain text).

.txt files are also accepted.

cargo run examples/fact.phx
=> 120: Int

Run from stdin

You can also pipe code from stdin (use - explicitly):

echo "1 + 2" | cargo run -
=> 3: Int

Resulting value and inferred type are printed on success, like this:

=> value: type

On error, an error message is printed, like this:

parse error: UnrecognizedToken { /* snip... */ }

First Program

> println "Hello World!"
Hello World!
=> Ok (): Result Error ()

> println ("Hello" ++ " " ++ "World!")
Hello World!
=> Ok (): Result Error ()

> println <| fmt::show <| fmt::sep_by "_" <| "Hello World!"
H_e_l_l_o_ _W_o_r_l_d_!
=> Ok (): Result Error ()

Note

Phox is still under active development.
While most I/O primitives have not yet been implemented, but the following core features are already available:

Type-level Unicode String Framework (ScalarString, UTF-8, etc.)

Pretty-Printing Combinators

Rich pure-functional core libraries (iter, array, fmt, …)

Experimental print, println, eprint, and eprintln

Pipeline Example

Iterator / Generator / Sink pipelines

> counter 1 |> take 5 |> fold (+) 0
=> 15: Int

> counter 1 |> take 10 |> fold (+) 0
=> 55: Int

> counter 1 |> take 5 |> fold (*) 1
=> 120: Int

> counter 1 |> filter (|(% 2) >> |(== 0)) |> take 5 |> collect Nil
=> Cons 2 (Cons 4 (Cons 6 (Cons 8 (Cons 10 Nil)))): List Int

> counter 1 |> filter (|(% 2) >> |(== 0)) |> take 5 |> collect @[]
=> @[2, 4, 6, 8, 10]: @[Int]

> collect "abc" "def"
=> "abcdef": ScalarString

> "abc" ++ "def"
=> "abcdef": ScalarString

> @[1,2,3] ++ @[10,20]
=> @[1, 2, 3, 10, 20]: @[Int]

> "Hello" ++ " " ++ "World"
=> "Hello World": ScalarString

Note

The infix operator ++ is synonym of collect function.

Pretty-Printing combinator pipelines

> use ::core::fmt::*;
=> (): ()

> show <| hex-dump <| cast "🎉🤣👍🍺"
=> "F0 9F 8E 89 F0 9F A4 A3 F0 9F 91 8D F0 9F 8D BA": ScalarString

> show <| sep_by ", " <| map single-quote <| "🎉🤣👍🍺"
=> "'🎉', '🤣', '👍', '🍺'": ScalarString

Note

The ::core::fmt module is not imported by default.
You must either:

call functions with a module path such as fmt::show, or

explicitly import the module with use ::core::fmt::*;

Comments

Line comments

In regular expression: //[^\n\r]*

starts with //, followed by any characters (single-line text), and ends with newline.

// This is a line-comment.

////// This is also a line-comment.

Similar to C/C++ line-comments.

Block comments

In regular expression: /\*[^*]*\*+(?:[^/*][^*]*\*+)*/

starts with /*, followed by any charcters (single or multi-line texts), and ends with */.
Nesting is not allowed.

/* This is a block comment. */

/*
 * This is also a block comment.
 */

/*****
 *** This is also a block comment.
 **/

Similar to C/C++ block-comments.

Semicolon (item separator)

; separates multiple items (declarations / statements / expressions) in a block or at the top level.
Each item is evaluated in order; only the last expression’s value is returned.
If a block or top-level input ends with ;, an implicit () is added.

// Multiple items in a block
{
    let x = 1; // => (): ()  (discarded)
    let y = 2; // => (): ()  (discarded)
    x + y;     // => 3: Int  (discarded)
    2 * x + y  // => 4: Int  (result)
}
// => 4: Int

// Multiple items in the top level
let x = 1; // => (): ()  (discarded)
let y = 2; // => (): ()  (discarded)
x + y;     // => 3: Int  (discarded)
2 * x + y  // => 4: Int  (result)
// => 4: Int

// Items in a block ends with `;`
{
    1 + 2; // => 3: Int  (discarded)
}
// => (): ()

// Items in the top level ends with `;`
1 + 2; // => 3: Int  (discarded)
// => (): ()

// No items in a block
{}
// => (): ()

Primitive Types

type	values	description
`()`	`()`	Unit type, that consists of single value `()`.
`Bool`	`true`, `fales`	Boolean type. True or false.
`Int`	`0`, `100`, `-1`, …	Integer type. (currently 64-bit integer)
`u8`	`0u8`, `100u8`, `0xFFu8`, …	8-bit unsigned integer type.
`u16`	`0u16`, `100u16`, `0xFFFFu16`, …	16-bit unsigned integer type.
`u32`	`0u32`, `100u32`, `0xFFFFFFFFu32`, …	32-bit unsigned integer type.
`u64`	`0u64`, `100u64`, `0xFFFFFFFFFFFFFFFFu64`, …	64-bit unsigned integer type.
`ScalarValue`	`'A'`, `'\n'`, `'🎉'`, …	Unicode Scalar Value (i.e. character)
`ScalarString`	`"abc"`, `"🎉🤣👍🍺"`, …	Unicode UTF-8 string

Note

ScalarString is synonym of Str ScalarValue type.

Type definitions

type Option a = Some a | None;
type Pair a b = Pair a b;
type Result e a = Ok a | Err e;

Variants can take 0 or more arguments.
Newtype shorthand is available when:
- There is only one variant, and
- The type name and constructor name are the same, and
- The variant has exactly one tuple, one record, or one array argument.

// Normal form
type Point a = Point @{ x: a, y: a };
type Wrapper a = Wrapper (a,);

// Newtype shorthand
type Point a = @{ x: a, y: a };
type Wrapper a = (a,);

Language constructs

Identifiers

Variables and type variables

In regular expression:[a-z_]([-]*[a-zA-Z0-9_]+)*['?]*

seq, x', some?, foo-bar
fooBar, a0, take_while

Constructor, type constructor, and trait names

In regular expression:[A-Z][a-zA-Z0-9_]*

List, Nil, Eq, Iter

Module names

In regular expression:[a-z_]([-]*[a-zA-Z0-9_]+)*

core, cmp, iter

Module path (qualified module names)

A sequence of module names separated by ::

::core::cmp, ::core::list Absolute path
cmp, list, foo::bar Relative path

Qualified names

A module path followed by a variable, constructor, type constructor, or trait name

::core::iter::seq, ::core::iter::Seq, ::core::iter::Iter
iter::seq, iter::Seq, iter::Iter

Literals

() Unit value literal
true, false Boolean value literals
0,100, -1 Integer value literals
0u8, 100u8, 0xFFu8 8-bit unsigned integer literals
0u16, 100u16, 0xFFFFu16 16-bit unsigned integer literals
0u32, 100u32, 0xFFFFFFFFu32 32-bit unsigned integer literals
0u64, 100u64, 0xFFFFFFFFFFFFFFFFu64 64-bit unsigned integer literals
'A', '\n', '🎉' Unicode Scalar Value literals
"abc", "🎉🤣👍🍺" Unicode UTF-8 string literals

Primitive types

() Unit type
Bool Boolean type
Int Integer type
u8 8-bit unsigned integer type
u16 16-bit unsigned integer type
u32 32-bit unsigned integer type
u64 32-bit unsigned integer type
ScalarValue Unicode Scalar Value type
ScalarString Unicode UTF-8 string type (synonym of Str ScalarValue)
@[t] or @[] t Array type
(t,), (t1, t2) Tuple type
@{x:t1, y:t2} Record type
t1 -> t2 Function type
T t, t1 t2 Type-level function application

Operators

Infix (binary) operators

==, != Equality / Non-equality operators
<=, <, >, >= Comparison operators
&&, || Logical operators
+, -, *, /, % Arithmetic operators
|>, <| Pipeline operators
>>, << Function composition operators

Prefix operators

nagete, - Arithmetic negation
not, ! Boolean not

User defined infix operators

In regular expression:[*+\-/!$%&=^?<>]+

?>, ===, <+>

Operators as functions

(==) 1 1
(+) 1 2

Functions and constructors as operators

1 `Cons` Nil
x `@{Eq Int}.(==)` y

Declarations

At the top level of a module:

mod Define sub-module
use Import identifiers
type Define Algebraic Data Type (ADT)
trait Define Multi-Parameter Type Class (MPTC)
impl Define MPTC instance implementation
*let Define generic/overloaded function template
let Define a variable or function
let rec Define a recursive function

Statements

Within a local scope:

let Variable / function binding
let rec Recursive function binding

Note

In the case of let rec p = e;,

p must be a variable pattern.

In the case of let p = e;,

At the top level of a module, p must be a variable pattern.

Within a local scope, any pattern can be used for p.

Expressions

Value expressions

(evaluate to themselves; safe to generalize)

(), true, 0 Literals
λp.e/\p.e Lambda abstraction (function)
@[], @[e1], @[e1, e2] Array value
(e1,), (e1, e2) Tuple value
@{x = e1, y = e2} Record value
@{T t1 t2} Trait record value
|(e op)/|(e op _) Infix-operator partial application (bind the 1st argument)
|(op e)/|(_ op e) Infix-operator partial application (bind the 2nd argument)

Infix-operator partial applications (a.k.a. section syntax) are desugared into lambda abstraction.

Note

Old section syntax #(e op)/#(e op _) and #(op e)/#(_ op e) are deprecated.
Use new syntax |(e op)/|(e op _) and |(op e)/|(_ op e) instead.

Important

Old record value syntax @{x: e1, y: e2} are deprecated and no longer supported.
Use new syntax @{x = e1, y = e2} instead.

Non-value expressions

(require evaluation; not eligible for generalization)

x Evaluate variable
e1 e2 Function application
e1[e2] Array index access
e.0 Tuple index access
e.x Record field access
{ stmt1; stmt2; expr } Block expression (evaluates to the last expression; introduces a new scope)
if (e1) e2 else e3 If then else
match (e) { p1 => e1, .. } Pattern matching

Minumul Type Annotaions

e: t Optional type annotations for expressions
This adds a type equality constraint between the inferred type of e and the annotated type t.
This helps type inference and the constraint solver resolve the last mile of ambiguity.
Type annotations as signatures are required only in trait declarations.
Other bindings (such as let and lambda parameters) do not accept type signatures.

Note

The expr: t syntax is only permitted for atomic expressions.
If you need to type-annotate a complex expression,
enclose the expression in parentheses and follow it with the type annotation.

Examples:

let add = \a.\b. (a + b): Int;

add 2 + 3
// => 5: Int

(cast 10): u8
// => 10: u8

(cast 10): (Result e u8)
// => Ok 10: Result RuntimeError u8

Patterns

_ Wildcard pattern
(), true, 0 Literal pattern
'A', '🎉' Unicode Scalar Value literal pattern
x, foo Variable pattern
Nil, Cons p ps Constructor pattern
@[], @[p1, p2], @[p, ps..], @[p, ..], @[ps..], @[..] Array pattern
(p1,), (p1, p2) Tuple pattern
@{x = p1, y = p2}, @{x, y} Record pattern

Important

Old record pattern syntax @{x: p1, y: p2} are deprecated and no longer supported.
Use new syntax @{x = p1, y = p2} instead.

Phox Type System (Future)

Note

In this section, we will explore a vision for the Phox Type System that was technically feasible but was not adopted. For reference…

`Row` kind (Row polymorphism) and `Nat` kind (Type-level Natural numbers)

(kind)	variable	constraint solver	constraint	var-gen / union-find	constraint set
~~Kind~~	~~KindVarId `$a`~~	~~KindSolver~~	~~KindConstraint~~	~~KindVarContext~~	~~KindConstraintSet~~
Type	TypeVarId `a`	UnifiedSolver (Type + MPTC)	TypeConstraint	TypeContext	UnifiedConstraintSet (Type + MPTC)
Row	RowVarId `@a`	UnifiedSolver (Row)	RowConstraint	RowContext	UnifiedConstraintSet (Row)
Nat	NatVarId `#a`	UnifiedSolver (Nat)	NatConstraint	NatContext	UnifiedConstraintSet (Nat)

Note

TODO: Change section-syntax from #(op x) to |(op x).

Add new syntax |(op x) and mark old syntax as deprecated.

After Nat solver is implemented, remove old syntax #(op x).

Note

$a, @a, and #a are primarily used in pretty-printed inferred types.

KindVarId $a is internal use only for kind inference.

In user code,

type-level variables in expressions should generally be written as plain identifiers like a.

type-level variables in type/trait declarations should be written as a and #a for Type and Nat variables respectively.

As a future extension, ?a, #a, and @a may be accepted as syntactic sugar in expressions for explicit kind annotations (a : Type), (a : Nat), and (a : Row) respectively.

Note

Maybe Nat kind will not be implemented…
it’s not so useful.

Rough sketch of Phox Type System process-pipeline

parse: source code “a” → RawAST such as RawTyVar(“a”)
resolve phase: RawTyVar(“a”) → RowVarId(r), NatVarId(n), or TypeVarId(t)

if a appeared in Row context (@{ .. | a }), a shall be RowVarId(r)
if a appeared in type parameters (@{T a}), a shall be NatVarId(n) or TypeVarId(t)
- check type/trait declarations and its parameters’ kind,
- then assign NatVarId(n) or TypeVarId(t) according to the kind.

infer phase: infer and solve with UnifiedSolver
apply phase: bake specialized code

Note

row-var appears only in record/row contexts such as @{...}, so no ambiguity arises.

Since both type-var and nat-var can appear as type parameters only in trait record @{T a b}, they must resolve to either NatVarId or TypeVarId after kind-check and kind-assignment in resolve phase.

Kind inference and KindSolver are not needed for Phox.

Rough sketh of Phox Type System data sturctures

#![allow(unused)]
fn main() {
// Constructors for Kind expression
pub enum Kind {
    Fun(Box<Kind>, Box<Kind>),   // κ1 -> κ2
    Type,                        // τ
    Row,                         // ρ
    // Nat,                      // ν
}

pub enum TypeExpr {
    Ty(Type),
    Row(Row),
    // Nat(Nat),
}

// pub fn kind_of(texpr: &TypeExpr) -> Kind {
//     match texpr {
//         TypeExpr::Ty(ty) => match ty {
//             Type::Var(_) => {
//                 todo!()
//             }
//             Type::Con(_name) => {
//                 todo!()
//             }
//             Type::App(ref t1, ref t2) => {
//                 Kind::Fun(Box::new(kind_of(t1)), Box::new(kind_of(t2)))
//             }
//             Type::Fun(_, _) | Type::Tuple(_) | Type::Record(_) => {
//                 Kind::Type
//             }
//         }
//     }
// }

// Constructors for Type expression
pub enum Type {
    Var(TypeVarId),             // 型変数
    Fun(Box<Type>, Box<Type>),  // 関数型
    Con(Symbol),                // 型構築子

    App(Box<TypeExpr>, Box<TypeExpr>),  // 型適用

    Tuple(Vec<Type>),
    // Record(Vec<(String, Type)>), // => Type::App(Type::Con("@{}"), TypeExpr::Row(row))
}

pub struct TypeVarId(usize);  // `a`, `?100`

pub enum TypeConstraint {
    TypeEq(Type, Type),
    // -- for MPTC (`trait`/`impl`) and Overloaded Function Template (`*let`) --
    TraitBound(TraitHead),
    Overloaded(Symbol, Type, Vec<SchemeTemplate<Type>>),
}

// Constructors for Row expression
pub enum Row {
    Empty,                              // `@{}`
    Var(RowVarId),                      // `@{ | row }`
    Ext(Box<(Label, Type)>, Box<Row>),  // `@{ x:τ | row }`
}

pub struct RowVarId(usize);  // `@a`, `@?100`
pub type Label = String;

// T.B.D.
pub enum RowConstraint {
    RowEq(Row, Row),
    Has(Row, Label),
    Lacks(Row, Label),
}

// Constructors for Nat expression
// pub enum Nat {
//     Lit(usize),                         // `0`, `1`, ...
//     Var(NatVarId),                      // `n`
//     Add(Box<Nat>, Box<Nat>),            // `n + m`
// }

// pub struct NatVarId(usize);  // `#a`, `#?100`

// T.B.D.
// pub enum NatConstraint {
//     NatEq(Nat, Nat),
//     NatLT(Nat, Nat),
//     // NatEqu(...),   // natural number equation (e.g. `m + n = 0`)
// }

}

#![allow(unused)]
fn main() {
pub struct Scheme<T> {
    pub vars: Vec<Var>,             // quantified variables              (ex. `∀ a.`)
    pub constraints: ConstraintSet, // constraints / trait bounds        (ex. `(Eq a, Ord a) =>`)
    pub target: T,                  // the type, or                      (ex. `a -> a -> Bool`)
                                    // the trait head produced by `impl` (ex. `Eq (List a)`)
}

pub enum Var {
    Ty(TypeVarId),
    Row(RowVarId),
    // Nat(NatVarId),
}

pub enum Constraint {
    Ty(TypeConstraint),
    Row(RowConstraint),
    // Nat(NatConstraint),
}

pub struct ConstraintSet {
    pub primary:  Option<Box<TraitHead>>,
    pub requires: BTreeSet<Constraint>,
}

pub struct UnifiedSolver {
    // ...
}
}

🧭 Roadmap

✔️ Mission 1

Hindley-Milner type inference
Safe subset of System Fω
Algebraic Data Type (ADT)

✔️ Mission 2

Traits a.k.a., Type Class
Trait Record - Type Class as first-class record value

✔️ Mission 3

Multi-Parameter Type Class (MPTC)
Higher-order Trait Record - MPTC as first-class record value
Overloaded Function Template
Type and requires inference w/o type annotation

Extra Mission 1

Safe and Unsafe numeric conversion
Array, String type
Pretty-Printing Combinators
Stream for Parser Combinators
Parser Combinators
Safe Type Family (type trait/type impl)
String Family Literals

Extra Mission 2 - Proc system

Procedures with side-effects
Dynamic arrays (mutable / growable arrays)
Standard I/O
File I/O
Async-Computation and Async I/O combinators

Extra Mission 3 - Optimization

Unboxed arrays (for non-primitive types)
Object file, Link-Loader
Alias analysis and reuse analysis of heap memory.
LLVM IR code generator.

It’s a long, long journey…

🏁 Goal

Hello World!

> println "Hello, world! (Finally, we meet!)";
Hello, world! (Finally, we meet!)
=> (): ()

Keyboard shortcuts

The Phox programming language