> It’s possible to write the lexer and parser entirely by hand
I write mine all by hand. It's the easiest part of a compiler to write, by far. It's also the least troublesome.
One advantage of doing them by hand is better, more targeted error messages are easier to fold in.
sph 6 hours ago [-]
> I write mine all by hand. It's the easiest part of a compiler to write, by far. It's also the least troublesome.
It's also the most annoying if you're writing a new language. You want to iterate on its ideas, but can't do so until you have a parser done.
I've been designing a few language concepts over the past year, and it feels 80% of this time has been writing and debugging parsers; by the time I get to the meat of language design - the shape of its AST, the semantics of it - any small syntactic change means going back to update the lexer and parser stage. Doesn't help that I can't settle on a syntax.
BTW I first started with PEG, which are nice in theory, but I feel the separation of lexing and parsing stage to very helpful to reduce boilerplate (handling whitespace in PEG is obnoxious). Later, I hand-wrote my parsers (in C), but it's gotten so repetitive I've dedicated a weekend to just learning lex/yacc (actually flex/bison). Even if parsers are easy to write, it's good to have higher level tools to reduce the tedium of it.
> You want to iterate on its ideas, but can't do so until you have a parser done.
Embed your language into host language. It is simple to do even in C++.
Iterate on ideas till your heart content, then add syntax.
gpderetta 2 hours ago [-]
Alternatively (or concurrently) use s-expressions and add syntactic sugar later.
fredrikholm 6 hours ago [-]
I suspect that this is the more common opinion, especially when the desired outcome is real world use.
Recursive descent is surprisingly ergonomic and clean if one gets the heuristics right. Personally I find it way easier than writing BNF and its derivatives as you quickly get into tricky edge cases, slow performance and opaque errors.
hombre_fatal 34 minutes ago [-]
Yeah, grammars seem easy at first, but they're full of arcane knowledge like writing your tokens with the right affinity and greediness, ensuring back tracking is performant, and that you'll get good error messages.
Same with parser combinators. Not until a bunch of trial and error do you build up the intuitions you need to use them in production, I think.
Despite two decades of using those, I've found it much simpler to write my own scanning or RD parser.
dboreham 40 minutes ago [-]
In my lifetime parsers have gone from: written by hand with great effort, to: use a parser generator to be sure the grammar is correct and save effort, to: just write by hand because it's not that hard and error reporting, to: just use AI.
WoodenChair 9 hours ago [-]
This is very similar to the project I have in Chapter 2 of my new book Computer Science from Scratch [0]. It's also Tiny BASIC without INPUT. I called it NanoBASIC. But it's an interpreter not a compiler. This tutorial is a nice starting point. The chapter is much more comprehensive, so if you want to get into the weeds, I can recommend my own chapter (of course). But it's in Python, not Go. The code is on GitHub[1]. But this tutorial is great too.
I have played around with the Crafting Interpreters book. Maybe the author oversimplifies a bit, but it seems with yacc and lex that nothing is really stopping anyone from making their own language and compiler. It always seemed like black magic.
But the same could be said about books, nothing is stopping you from writing a book except good ideas, story, and structure.
azhenley 12 hours ago [-]
TinyBASIC is fun and beautifully simple. I wrote a 3-part tutorial for making a TinyBASIC-to-C compiler using Python a few years ago.
I know BASIC is kind of a “bad” language, but there’s something so delightful about it. If we’re plugging TinyBASIC projects that others might find interesting, I made an MMO TinyBASIC REPL the other day: http://10klob.com/
pjmlp 7 hours ago [-]
People too often complain about original BASIC, and forget most dialects moved away from line numbers and spaghetti GOTOs during the 16 bit days, with widepsread of compilers and structured constructs.
I am really glad that I only got to learn C, after getting through Turbo Basic, Quick Basic, Turbo Pascal[0], doing exactly the same kind of stuff urban myths say it was only possible after C came to be.
[0] - On 16 bit systems, I started coding on an 8bit Timex 2068.
musicale 8 hours ago [-]
BASIC is an amazing language that computing novices (including humanities majors) could learn in an afternoon, that could be efficiently compiled or compactly interpreted, that was small enough to support dozens of interactive users on a mainframe or minicomputer, or to fit into a tiny 8-bit microcomputer – and yet was largely equivalent to FORTRAN in terms of its expressive power.
I think the closest modern equivalents might be Python (for easy onramp and scalability from microcontrollers to supercomputers) and JavaScript (for pure ubiquity in every device with a web browser.)
I wonder if there is a modern-ish (?) environment that can match Visual BASIC in terms of easy GUI app programming. Perhaps Python or Tcl with Tk (Qt seems harder) or maybe Delphi, or perhaps a modern Smalltalk.
AstralStorm 7 minutes ago [-]
Well this afternoon bit...
Advanced BASICs are too big for that, and in less advanced ones you get to POKE the hardware to do certain things.
Which means you get to learn a bunch of hardware and machine code. That's not all bad though!
andsoitis 8 hours ago [-]
Delphi for sure. And while you have to run it on Windows, it can create binaries for Windows, macOS, Linux, and mobile.
Delphi, and naturally Visual Basic for .NET with Windows Forms, not forgeting about C#, however it is getting a bit too much featurities lately, and most likely not what the BASIC target audience would like.
fjfaase 8 hours ago [-]
Since 1990, I have developed software with C and later C++. Now that I am working on a C compiler, I am learning new things about the language. So, writing a compiler (or an interpretter) can really help to get a deep understanding of a programming language.
bastawhiz 12 hours ago [-]
>The original authors of yacc were Mike Lesk and Eric Schmidt - yes that Eric Schmidt.
I don't know if it's worth mentioning, but the author of the post is David Singleton, the former CTO of Stripe. I almost hadn't noticed until I saw the domain.
refulgentis 12 hours ago [-]
I worked ~4 layers underneath him when he led Android Wear at Google, and every year or two that happens to me, and it puts a smile on my face. Gotta have love of the game to do this at that level.
IIRC, and man, maybe I'm making it up, but, lore was he always made time on a regular schedule to hack.
Usually 1 layer from the bottom isn't coding so much anymore.
(oddly, I didn't realize he was *CTO* of Stripe until a few months back, when his new thing with Hugo Barra was announced)
p4bl0 3 hours ago [-]
I like these kind of fun projects!
I wrote a very small but complete compiler and VM for a very simple language: boolean expressions. I use it as a "what to expect" type of introduction during the first session of my compiler course.
nice to see that the red dragon book is still a thing. I've heard people have stopped using the wizard book.
musicale 9 hours ago [-]
> Yes, this is what I do for fun.
Don't we all? ;-)
matthewmueller 11 hours ago [-]
Love reading these. Keep these blog posts coming!
7 hours ago [-]
teo_zero 8 hours ago [-]
Wait! What are == and != doing in a BASIC language? Heresy! :)
dps 7 hours ago [-]
Yeah, I really should have included <> :-)
Fun to see this post from the deep archive get some interest - thanks for reading!
TMWNN 13 hours ago [-]
I thought a compiler, with no adjective or caveat, should turn a HLL into machine language. Isn't what this describes—turning BASIC into Go—more accurately described as a "pseudocompiler" or "Go compiler" or somesuch? I know Emacs is always said to have a "bytecode compiler" that processes Elisp code, not a "compiler" per se. Am I mistaken?
tuveson 11 hours ago [-]
This kind of question winds up being the CS equivalent of “is a hotdog a sandwich”. I agree that transpiler is a more accurate term for it and that a hotdog is not a sandwich. But there are lots of languages that start life as compile-to-C things. Many compiled languages today output LLVM IR which is not machine language. Similarly people would probably call javac a compiler, even though it outputs bytecode.
vrighter 6 hours ago [-]
A compiler is a language translator. To me it makes no difference whether it generates machine code (one type of language, machine executable), asm (equivalent language but which needs additional preprocessing into machine code), or some other language (which also needs additional preprocessing into machine code, but the tool for this is another compiler). Or it might output Java bytecode or MSIL, which doesn't even target a real, physical machine at all.
They translate one language into another. The line between compiler/transpiler just doesn't make sense to me.
orthoxerox 4 hours ago [-]
Let's put it this way: if a program translates code written in a "smarter" language into code written in a "stupider" language, then it is okay to call it a compiler.
If it translates a restricted subset of BASIC into Go, it doesn't really do anything beyond replacing one syntax with another.
fao_ 13 hours ago [-]
Strictly speaking it's a transpiler, but honestly the delta between the target language (Go) and the source language (BASIC) is very fluffy and wooly, from what I remember from my PL theory days the distinction was always fuzzy enough that people used whatever term felt right to them.
An example off the top of my head — Chicken Scheme (call-cc.org) calls itself a compiler but it's target language is C
shakna 6 hours ago [-]
"Transpile" is a shortening of the older term "trans-compile". [0]
It's a subset. All transpilers are compilers. Not all compilers are transpilers.
[0] Amiga BASIC called itself a transcompiler, from memory.
ethan_smith 6 hours ago [-]
A compiler is any program that translates from one language to another (source-to-source, source-to-bytecode, or source-to-machine code), so translating BASIC to Go is indeed a proper compiler, just as GCC translating C to LLVM IR before machine code is still a compiler.
pxc 13 hours ago [-]
The standard term for this kind of compiler is "transpiler", afaik.
Here's the Wikipedia page for such things, which also taught me several other names for them:
They do, but that article also mixes “transpile” and “compile” often enough that it is near impossible to deduce what different meanings they might ascribe.
khaledh 57 minutes ago [-]
The original term for "compiler" was not restricted to compiling down to machine code. From Grace Hopper's paper "The Education of a Computer" (1952)[1]:
Specifications for computer information, a catalogue, compiling routines, and subroutines will be given after adding another level to the block diagram. As Fig. 5 stands the mathematician must still perform all mathematical operations, relegating to the UNIVAC programming and computational operations. However, the computer information delivered by the mathematician no longer deals with numerical quantities as such. It treats of variables and constants in symbolic form together with operations upon them. The insertion of a fourth level of operation is now possible, Fig. 6. Suppose, for example, the mathematician wishes to evaluate a function and its first n derivatives. He sends the information defining the function itself to the UNIVAC. Under control of a "compiling routine of type B", in this case a differentiator, using task routines, the UNIVAC delivers the information necessary to program the computation of the function and its derivatives. From the formula for the function, the UNIVAC derives the formulas of the successive derivatives. This information processed under a compiling routine of Type A yields a program to direct the computation.
Notice the case for two "compilers": Compiler B (differentiator) compiles symbolic notation to another program, that is fed to Compiler A, which produces an executable that does the actual computation.
IMO, a better term for a compiler would have been a "translator."
So, if he had invoked go for you would it be a compiler?
Another definition is that it translates a source language into a target language.
icemanind 11 hours ago [-]
I wrote a full fledged interpreter by hand for a vintage BASIC language. I called it Basice (I'm Icemanind, so Basice is a play on BASIC and Iceman). It's a little outdated now though. I wrote it using C# and it doesn't use any external libraries, like yacc or flex. It's all by hand. You can see it at https://github.com/icemanind/Basice
Incorrect, they were authors of lex. yacc was authored by Stephen Johnson.
Surprising to me is all the authors are still around, even though the tools are over 50 years old!. Shows how young computer science field is.
https://www.cs.princeton.edu/~bwk/
Not the one who was ever Google's CEO. right?
I write mine all by hand. It's the easiest part of a compiler to write, by far. It's also the least troublesome.
One advantage of doing them by hand is better, more targeted error messages are easier to fold in.
It's also the most annoying if you're writing a new language. You want to iterate on its ideas, but can't do so until you have a parser done.
I've been designing a few language concepts over the past year, and it feels 80% of this time has been writing and debugging parsers; by the time I get to the meat of language design - the shape of its AST, the semantics of it - any small syntactic change means going back to update the lexer and parser stage. Doesn't help that I can't settle on a syntax.
BTW I first started with PEG, which are nice in theory, but I feel the separation of lexing and parsing stage to very helpful to reduce boilerplate (handling whitespace in PEG is obnoxious). Later, I hand-wrote my parsers (in C), but it's gotten so repetitive I've dedicated a weekend to just learning lex/yacc (actually flex/bison). Even if parsers are easy to write, it's good to have higher level tools to reduce the tedium of it.
Iterate on ideas till your heart content, then add syntax.
Recursive descent is surprisingly ergonomic and clean if one gets the heuristics right. Personally I find it way easier than writing BNF and its derivatives as you quickly get into tricky edge cases, slow performance and opaque errors.
Same with parser combinators. Not until a bunch of trial and error do you build up the intuitions you need to use them in production, I think.
Despite two decades of using those, I've found it much simpler to write my own scanning or RD parser.
0: https://nostarch.com/computer-science-from-scratch
1: https://github.com/davecom/ComputerScienceFromScratch
But the same could be said about books, nothing is stopping you from writing a book except good ideas, story, and structure.
Let’s make a Teeny Tiny compiler https://austinhenley.com/blog/teenytinycompiler1.html
I am really glad that I only got to learn C, after getting through Turbo Basic, Quick Basic, Turbo Pascal[0], doing exactly the same kind of stuff urban myths say it was only possible after C came to be.
[0] - On 16 bit systems, I started coding on an 8bit Timex 2068.
I think the closest modern equivalents might be Python (for easy onramp and scalability from microcontrollers to supercomputers) and JavaScript (for pure ubiquity in every device with a web browser.)
I wonder if there is a modern-ish (?) environment that can match Visual BASIC in terms of easy GUI app programming. Perhaps Python or Tcl with Tk (Qt seems harder) or maybe Delphi, or perhaps a modern Smalltalk.
Advanced BASICs are too big for that, and in less advanced ones you get to POKE the hardware to do certain things. Which means you get to learn a bunch of hardware and machine code. That's not all bad though!
https://www.embarcadero.com/products/delphi
I don't know if it's worth mentioning, but the author of the post is David Singleton, the former CTO of Stripe. I almost hadn't noticed until I saw the domain.
IIRC, and man, maybe I'm making it up, but, lore was he always made time on a regular schedule to hack.
Usually 1 layer from the bottom isn't coding so much anymore.
(oddly, I didn't realize he was *CTO* of Stripe until a few months back, when his new thing with Hugo Barra was announced)
I wrote a very small but complete compiler and VM for a very simple language: boolean expressions. I use it as a "what to expect" type of introduction during the first session of my compiler course.
The whole code is here, it is less than 150 lines of OCaml code (plus a few lines of C for the VM) and uses standard parsing tools: https://gist.github.com/p4bl0-/9f4e950e6c06fbba7e168097d89b0...
Where is the special tooling to help spot errors?
[2] https://mingodad.github.io/parsertl-playground/playground/ not sure.
[3] https://chrishixon.github.io/chpeg/playground/
[4] https://mdkrajnak.github.io/ebnftest/
Don't we all? ;-)
Fun to see this post from the deep archive get some interest - thanks for reading!
They translate one language into another. The line between compiler/transpiler just doesn't make sense to me.
If it translates a restricted subset of BASIC into Go, it doesn't really do anything beyond replacing one syntax with another.
An example off the top of my head — Chicken Scheme (call-cc.org) calls itself a compiler but it's target language is C
It's a subset. All transpilers are compilers. Not all compilers are transpilers.
[0] Amiga BASIC called itself a transcompiler, from memory.
Here's the Wikipedia page for such things, which also taught me several other names for them:
https://en.m.wikipedia.org/wiki/Source-to-source_compiler
IMO, a better term for a compiler would have been a "translator."
[1] https://dl.acm.org/doi/pdf/10.1145/609784.609818