Mercari Engineering Blog

We're the software engineers behind Mercari. Check out our blog to see the tech that powers our marketplace.

Write Your Own Go Linters with Parser Package

This is the seventh post of Merpay Advent Calendar 2019 from Adler, Backend Engineer in Payment Platform team at Merpay.

What's a Linter? Why Do I Need One?

Linters are tools that help to improve code readability, consistency, and maintainability by catching code smell and anti-patterns. Linters play an important role in the development process by helping to reduce the time required for developers to review trivial issues such as typo, function length, and useless arguments. Linters are also useful when integrated into development processes like CI.

There are many powerful linters in the Go community like gofmt and dupl. There's also a curated list of useful linters by GolangCI. You probably won't need to use all of them, but introducing one or two might hugely boost your team's productivity. Since it's the main programming language in Merpay, we also encourage teams to integrate linters into their workflow based on requirements.

There are, however, situations where creating a custom linter could give you great value. For instance, if your company has a job interview assignment that requires a candidate to write benchmark tests for specific functions, reviewers would need to go through all cases using text search to see if they meet the criteria. Instead of doing this manually, it can be easily done with a custom linter that automatically looks for a benchmark test that includes a specific function call. Tools like this can save some time for reviewers from checking minor problems.

Wondering how to build one? Here's how.

Let's Get Started: The Basics

Before we can build something useful, we need to know how it works. For parsing Go source code, 3 packages are required: parser, ast, and token. Basically, we'll need to break down the code into an abstract syntax tree before we can start an analysis. Each piece of code is treated as a node. A node might represent a function declaration, a variable assignment, or simply a comment.

Let's say we have a file called "foo.go" with a basic Foo function:

package foo

import "fmt"

func Foo() {
    fmt.Println("Hello, Foo")
}

This file can be broken down into 3 parts: package, import, and a function. We'll retrieve the information for each part using the parser package as follows.

package main

import (
    "fmt"
    "go/parser"
    "go/token"
)

func main() {
    // Supposedly it's a subdirectory within the current directory
    fpath := "./foo/foo.go"
    fset := token.NewFileSet()

    file, err := parser.ParseFile(fset, fpath, nil, parser.Mode(0))
    if err != nil {
        panic(err)
    }

    // A. The package name
    fmt.Println("package name:", file.Name)

    // B. The import
    imports := file.Imports
    // The type []*ImportSpec so we need to loop through the slice to
    // properly go through each package being imported.
    for _, i := range imports {
        path := *i.Path
        fmt.Println(path.Value)
    }

    // C. The function
    //
    // file.Decls returns a slice of declarations. Looping through it
    // to retrieve the function declaration.
    for _, decl := range file.Decls {
        switch d := decl.(type) {
        // There are many more other types but we only focus on FuncDecl here.
        case *ast.FuncDecl:
            fmt.Println("function declaration:", d.Name)
        }
    }
}

Running the program gives us this result:

> go run main.go
package name: foo
"fmt"
function declaration: &{<nil> <nil> Foo 0xc0000a6120 0xc00009a1e0}

When a file is parsed, you get a struct that contains a lot of information back. A detailed description can be found in the package documentation, but here's a highlight of what we have used:

type File struct {
    Name       *Ident          // package name
    Decls      []Decl          // top-level declarations; or nil
    Imports    []*ImportSpec   // imports in this file

    // other fields are hidden
}

The Name field contains the name of the package. Imports focuses only on the import section of the file, and lastly, Decls contains everything else that is declared in the file, including but not limited to structs, variables, and functions.

Hands-On: Checking for Unnecessary Newlines

In the previous section, we quickly covered the fundamentals of ASTs in Go. However, the only way to learn to use it is to build a working library.

The objective here is to build a minimal linter that checks for unnecessary newlines at the beginning of a function. Once detected, our linter should print a message. Here is an example of a problematic file:

package foo

import "fmt"

// extra newline
func Foo() {

    fmt.Println("Hello Foo")
}

// no extra newline
func Bar() {
    fmt.Println("Hello Bar")
}

The program we are building should indicate the existence of the unnecessary newline at the beginning of the Foo function, but not print any message about the Bar function.

If we were to analyze the input as we did in the previous section, the code would not be readable because the structure would become very complicated. We do not want to stuff our implementation with a ridiculous amount of switch statements. So, in order to do it in a more elegant way, we will introduce the use of ast.Visitor, which walks through all nodes in a file and builds a syntax tree. Within that tree, it is a lot easier to differentiate between nodes and tokens.

Using a visitor would look like the following:

package main

import (
    "fmt"
    "go/ast"
    "go/parser"
    "go/token"
)

func main() {
    fpath := "./foo/foo.go"
    fset := token.NewFileSet()
    f, err := parser.ParseFile(fset, fpath, nil, parser.Mode(0))
    if err != nil {
        fmt.Println(err)
        return
    }

    var v visitor
    ast.Walk(v, f)
}

type visitor struct{}

// Visit function walks through each node in a file
func (v visitor) Visit(n ast.Node) ast.Visitor {
    if n == nil {
        return nil
    }

    fmt.Printf("%T\n", n)
    return v
}

Once run, the result would be:

> go run main.go
*ast.File
*ast.Ident
*ast.GenDecl
*ast.ImportSpec
*ast.BasicLit
*ast.FuncDecl
*ast.Ident
*ast.FuncType
*ast.FieldList
*ast.BlockStmt
*ast.ExprStmt
*ast.CallExpr
*ast.SelectorExpr
*ast.Ident
*ast.Ident
*ast.BasicLit

By using the Visit function, we will be able to see each node and its purpose. They are not very expressive by themselves so here is a picture showing which nodes we are using in this program:

f:id:adlerhsieh:20191204165832p:plain
Node Explanations

Note that the fmt.Println function call includes multiple nodes, but due to the fact that this program is only involved with newlines, most nodes are ignored.

Unfortunately, the node list does not contain an identifier for a newline, so our implementation will check the difference between:

1) The line number of the beginning of a block statement, i.e. "{".

2) The line number of the first statement within the block.

Essentially, the line number difference should be larger by one. Any number exceeding that points to the fact that there is one or more blank line(s).

After understanding the concept, the following is the complete source code, with comments, to achieve our goal.

package main

import (
    "fmt"
    "go/ast"
    "go/parser"
    "go/token"
)

func main() {
    fpath := "./foo/foo.go"
    fset := token.NewFileSet()
    f, err := parser.ParseFile(fset, fpath, nil, parser.Mode(0))
    if err != nil {
        fmt.Println(err)
        return
    }

    v := visitor{
        fset:      fset,
        funcNames: make(map[*ast.BlockStmt]*ast.Ident),
    }

    ast.Walk(v, f)
}

type visitor struct {
    // We need this field to save the fileset as a reference for line numbers.
    fset *token.FileSet
    // When a function is caught having an unnecessary line, the function name is retrieved here.
    funcNames map[*ast.BlockStmt]*ast.Ident
}

func (v visitor) Visit(node ast.Node) ast.Visitor {
    // nil node is skipped as it is irrelevant to our goal
    if node == nil {
        return nil
    }

    // Once we find a function, we save the function name in
    // a map using its body statement as a pointer key.
    if funcDecl, ok := node.(*ast.FuncDecl); ok {
        blockStmt := funcDecl.Body
        v.funcNames[blockStmt] = funcDecl.Name
    }

    if blockStmt, ok := node.(*ast.BlockStmt); ok {
        // Find the line number of the beginning of a block statement.
        stmtStartingPosition := blockStmt.Pos()
        stmtLine := v.fset.Position(stmtStartingPosition).Line

        // Find the line number of the first statement in the block.
        firstStmt := blockStmt.List[0]
        firstStmtStartingPosition := firstStmt.Pos()
        firstStmtLine := v.fset.Position(firstStmtStartingPosition).Line

        // The difference should be one. Newlines exist when it is larger.
        if stmtLine+1 < firstStmtLine {
            // Retrieve the function name with the pointer key we saved earlier,
            // and print it.
            funcName := v.funcNames[blockStmt]
            fmt.Printf("Unnecessary newline at the beginning: %s\n", funcName)
        }
    }

    return v
}

Running the code will print out the result:

> go run main.go
Unnecessary newline at the beginning: Foo

Looks great! You can try it out yourself on Go Playground.

Remember that this is an over-simplified solution. There are many scenarios that cannot be solved in this way. There is an existing library, whitespace, for solving the exact issue using a more structured approach. Check its source code out if you want to bring our tool to the next level.

For Merpay Advent Calendar, tomorrow Backend Engineer @knsh14 will bring you "バッチコマンドをテストしやすいようにリファクタリングする". Please be looking forward to it :)