Some time ago, I created a Go tool, github.com/g4s8/envdoc, which runs as part of go generate. It parses Go source files, extracts documentation for struct fields annotated with env tags, and generates documentation in markdown, HTML, or plaintext formats. This was an enlightening journey, as I had not previously engaged with Go generators nor worked with Go source file parsers. In this post, I will share my learnings from this experience and demonstrate how to create Go generators and AST parsers.
Let’s construct a simple tool that, when run by the go generate
command from a //go:generate
instruction,
parses the struct following this comment and prints all fields and documentation to stdout.
The project structure for this example is as follows:
|- go.mod
|- go.sum
|- main.go (the tool's main file)
|-|
|- .testfiles (targets we will test on)
|-|
|- target.go (the target file to extract field comments from)
Here’s an example of the .testfiles/target.go
struct we’ll use in this article:
package main
// Config is a configuration of the target server
//
//go:generate go run ../
type Config struct {
// Host is a host name or IP address of the target server
Host string
// Port is a port number of the target server
Port int
// Protocol is a protocol of the target server
Protocol string
// Timeout is a timeout of the target server
Timeout int
}
Upon executing the go generate
command, we expect to see the following output:
$ go generate ./.testfiles/target.go
Config:
- Host (string) - Host is a host name or IP address of the target server
- Port (int) - Port is a port number of the target server
- Protocol (string) - Protocol is a protocol of the target server
- Timeout (int) - Timeout is a timeout of the target server
Go Generators Link to heading
The go generate documentation specifies several environment variables made available during the process initiated by go generate:
Go generate sets several variables when it runs the generator:
$GOARCH
The execution architecture (arm, amd64, etc.)
$GOOS
The execution operating system (linux, windows, etc.)
$GOFILE
The base name of the file.
$GOLINE
The line number of the directive in the source file.
$GOPACKAGE
The name of the package of the file containing the directive.
$GOROOT
The GOROOT directory for the 'go' command that invoked the
generator, containing the Go toolchain and standard library.
$DOLLAR
A dollar sign.
$PATH
The $PATH of the parent process, with $GOROOT/bin
placed at the beginning. This causes generators
that execute 'go' commands to use the same 'go'
as the parent 'go generate' command.
We are particularly interested in the $GOFILE
and $GOLINE
variables: the former is needed to read the target file
and parse it as an AST, and the latter to identify the go:generate
statement line,
enabling us to process the subsequent struct definition. Let’s begin crafting our main.go
file:
func main() {
targetFile := os.Getenv("GOFILE")
targetLineStr := os.Getenv("GOLINE")
var targetLine int
if i, err := strconv.Atoi(targetLineStr); err != nil {
panic(err)
} else {
targetLine = i
}
}
Parse Tokens Link to heading
Next, we require a few Go packages to parse the target file:
go/ast
go/doc
go/parser
go/token
First, we utilize the go/token
package to parse the Go source file into lexical tokens.
To parse the target file, we create a token.FileSet
structure to track all parsed file tokens,
then call parser.ParseFile
, which returns an ast.Files
and adds the file tokens into the fileset.
Since we also need to extract comments, we pass the parser.ParseComments
flag to parser.ParseFile
:
fileSet := token.NewFileSet()
astFile, err := parser.ParseFile(fileSet, targetFile, nil, parser.ParseComments)
File Line Positions Link to heading
The subsequent step involves extracting line info. All tokens are stored with position offsets,
necessitating the acquisition of positions for each line’s start.
We can obtain this from the fileSet
by targeting the file’s start position:
f := fileSet.File(astFile.Pos())
lines := f.Lines()
Extract Documentation Link to heading
We must not overlook extracting documentation for fields within our files, achievable via the go/doc
package:
docs, err := doc.NewFromFiles(fileSet, []*ast.File{astFile}, "./", doc.PreserveAST)
The docs
variable now contains package documentation, which we can utilize to locate documentation lines for struct fields.
It’s important to note the doc.PreserveAST
flag; by default, doc.NewFromFiles
modifies AST nodes,
but we require them untouched for later analysis.
AST Visitor Link to heading
We are now prepared to navigate the AST of the target file and process each node accordingly. This necessitates the introduction of a new AST visitor struct to retain all pertinent information gleaned from target AST nodes:
type walker struct {
lines []int // Line position offset mapping.
docs *doc.Package // Package documentation.
goGenerateLine int // Line number of the //go:generate comment.
pendingLine bool // True if the next type node is a target type.
output strings.Builder // Output buffer.
}
Find the Trigger Comment Link to heading
The walker
type must implement the Visit
method to traverse AST nodes.
This method encapsulates all logic for AST parsing, beginning with comment node processing.
The objective here is to identify the go:generate
statement line that matches $GOLINE
and set pendingLine
to true,
indicating that the subsequent struct type is to be used for extracting field documentation.
func (w *walker) Visit(node ast.Node) ast.Visitor {
switch n := node.(type) {
case *ast.Comment:
if !n.Pos().IsValid() {
return w
}
// check if the comment is a //go:generate comment
text := n.Text
if !strings.HasPrefix(text, "//go:generate") {
return w
}
// check if the comment is on the same line as $GOLINE
var line int
for l, pos := range w.lines {
if token.Pos(pos) > n.Pos() {
break
}
// $GOLINE env var is 1-based.
line = l + 1
}
if line != w.goGenerateLine {
return w
}
// now we are at the correct line
w.pendingLine = true
}
return w
}
Parse Struct Fields Link to heading
The final and most intricate part involves dealing with the struct type AST node to extract all fields,
their types, and documentation. This process is encapsulated in the Visit
method within the subsequent case
after comment processing. Here’s a streamlined explanation:
case *ast.TypeSpec:
if !w.pendingLine {
return w
}
// found the target type
w.pendingLine = false
// extract the type name
name := n.Name.String()
strct, ok := n.Type.(*ast.StructType)
if !ok {
return w
}
w.output.WriteString(name)
// find the type documentation
var typeDoc *doc.Type
for _, t := range w.docs.Types {
if t.Name == name {
typeDoc = t
break
}
}
if typeDoc != nil {
w.output.WriteString(" - ")
w.output.WriteString(typeDoc.Doc)
}
// iterate over and append field names, types, and documentation.
for _, field := range strct.Fields.List {
if len(field.Names) == 0 {
// embedded field
continue
}
var names []string
for _, name := range field.Names {
names = append(names, name.String())
}
namesStr := strings.Join(names, ", ")
fieldType := field.Type.(*ast.Ident).Name
var fieldDoc string
if fd := field.Doc; fd != nil {
fieldDoc = fd.Text()
}
w.output.WriteString(" - ")
w.output.WriteString(namesStr)
w.output.WriteString(" ")
w.output.WriteString(fmt.Sprintf("(%s)", fieldType))
w.output.WriteString(" - ")
w.output.WriteString(fieldDoc)
}
Running AST Walker Link to heading
To execute our AST walker/inspector
:
w := &walker{
lines: lines,
docs: docs,
goGenerateLine: targetLine,
}
ast.Walk(w, astFile)
fmt.Println(w.output.String())
We construct it using the line mappings, parsed package documentation, and the go:generate
comment line,
then invoke ast.Walk
with our walker and the astFile. Finally, we print the output:
$ go generate ./.testfiles/target.go
Config - Config is a configuration of the target server
- Host (string) - Host is a host name or IP address of the target server
- Port (int) - Port is a port number of the target server
- Protocol (string) - Protocol is a protocol of the target server
- Timeout (int) - Timeout is a timeout of the target server
This approach yields the expected result for our examples, demonstrating the power and flexibility of Go’s AST manipulation capabilities for generating custom documentation and other forms of code analysis.
Conclusion Link to heading
Through developing the envdoc tool and exploring Go’s go generate and AST parsing capabilities, I’ve uncovered a powerful approach for automating documentation and enhancing code analysis. This journey not only illustrates the utility of Go’s built-in packages for source code manipulation but also highlights the potential for creating tools that significantly improve development workflows.
This experience demonstrates the vast possibilities within Go’s ecosystem for developers to build sophisticated, yet straightforward tools that can lead to more maintainable and self-documenting code. As we look forward, the techniques shared here offer a foundation for further exploration and innovation in code automation and analysis.
I hope this exploration inspires you to delve deeper into Go’s features and consider how its tooling can be applied to your own projects, driving efficiency and code quality to new heights.
References Link to heading
- The final version of this example and target file source code - examples/go-generate
- My documentation generator tool - github.com/g4s8/envdoc
go/token
documentation - pkg.go.dev/go/tokengo/parser
documentation - pkg.go.dev/go/parser- Go generate documentation - pkg.go.dev/cmd/go/internal/generate
- Blog post about Go generators - go.dev/blog/generate