A script’s tale – exploring V8

When I started to play around with V8 and tried to understand the basics about how it works internally, I felt pretty lost. The project’s homepage is rather high-level, tells how to make V8 run some „Hello world“ and how one could use it in an own application. But there is not much about the guts, so I started digging into them.

Overview

V8 is a JavaScript engine. As such, what it does is to take and evaluate JavaScript (resp. its ECMAScript implementation). Considering the limited set of built-ins alongside the primitives, this wouldn’t be that exciting. Especially with respect to (impossible) interaction with the outside world. But V8 provides an API which makes it interesting and e.g. led to the noteworthy Node.js framework.

JavaScript environment

To provide a runtime environment, V8 uses contexts. They hold the current state with the global object that owns all global variables. By binding custom object or function templates to the global object, one can extend the set of available objects and functions by arbitrary C++ code.

And this is how Chromium binds its DOM implementation to V8. For those curious about this templates should have a look at the /src/out/Debug/gen/webcore/bindings/V8 directory after building Chromium.

Script execution

Execution of a script is triggered by calling Compile and Run in the Script class. To provide an overview and navigation through what happens between this two calls is what this post is about.

Under the hood

A note/hint: I used the Eclipse CDT to explore V8 (I really learned to appreciate macro expansions, in-place overlaying of method declarations etc.) which are great for this purpose.

When V8 receives a script, it passes four stages: Parsing, AST creation, Compilation and Execution. V8 compiles to native machine code for the target platform, this is one reason for its fast performance.

Script::Compile is implemented in api.cc, and via calling Script::New triggers the compilation process:

// beginning in line 1560
Local<Script> Script::Compile(v8::Handle<String> source,
                              v8::ScriptOrigin* origin,
                              v8::ScriptData* pre_data,
                              v8::Handle<String> script_data) {
  i::Isolate* isolate = i::Isolate::Current();
  ON_BAILOUT(isolate, "v8::Script::Compile()", return Local<Script>());
  LOG_API(isolate, "Script::Compile");
  ENTER_V8(isolate);
  Local<Script> generic = New(source, origin, pre_data, script_data);
  if (generic.IsEmpty())
    return generic;
  i::Handle<i::Object> obj = Utils::OpenHandle(*generic);
  i::Handle<i::SharedFunctionInfo> function =
      i::Handle<i::SharedFunctionInfo>(i::SharedFunctionInfo::cast(*obj));
  i::Handle<i::JSFunction> result =
      isolate->factory()->NewFunctionFromSharedFunctionInfo(
          function,
          isolate->global_context());
  return Local<Script>(ToApi<Script>(result));
}

// beginning in line 1499
Local<Script> Script::New(v8::Handle<String> source,
                          v8::ScriptOrigin* origin,
                          v8::ScriptData* pre_data,
                          v8::Handle<String> script_data) {
  i::Isolate* isolate = i::Isolate::Current();
  // ...
  i::SharedFunctionInfo* raw_result = NULL;
  { i::HandleScope scope(isolate);
    // ...
    i::Handle<i::SharedFunctionInfo> result =
      i::Compiler::Compile(str,
                           name_obj,
                           line_offset,
                           column_offset,
                           NULL,
                           pre_data_impl,
                           Utils::OpenHandle(*script_data),
                           i::NOT_NATIVES_CODE);
    // ...
  }
  i::Handle<i::SharedFunctionInfo> result(raw_result, isolate);
  return Local<Script>(ToApi<Script>(result));
}

The compiler’s Compile implementation captures some statistics, looks up whether it already has a compiled version of it and if not hands over to MakeFunctionInfo:

// beginning in line 486
Handle<SharedFunctionInfo> Compiler::Compile(Handle<String> source,
                                             Handle<Object> script_name,
                                             int line_offset,
                                             int column_offset,
                                             v8::Extension* extension,
                                             ScriptDataImpl* pre_data,
                                             Handle<Object> script_data,
                                             NativesFlag natives) {
  // ...
  if (result.is_null()) {
    // ...
    result = MakeFunctionInfo(&info);
    if (extension == NULL && !result.is_null() && !result->dont_cache()) {
      compilation_cache->PutScript(source, result);
    }
  }
  // ...
  return result;
}

// beginning in line 372
static Handle<SharedFunctionInfo> MakeFunctionInfo(CompilationInfo* info) {

  // ...

  // Parsing
  if (!ParserApi::Parse(info, flags)) {
    return Handle<SharedFunctionInfo>::null();
  }

  // ...

  // Compilation
  if (!MakeCode(info)) {
    if (!isolate->has_pending_exception()) isolate->StackOverflow();
    return Handle<SharedFunctionInfo>::null();
  }

  // ...

  return result;
}

Parsing

The call to the parsing code ends up in parser.cc where a Parser object is prepared for its mission which will eventually start via ParseProgram:

// beginning in line 6034
bool ParserApi::Parse(CompilationInfo* info, int parsing_flags) {
  // ...
  if (info->is_lazy()) {
    ASSERT(!info->is_eval());
    Parser parser(info, parsing_flags, NULL, NULL);
    if (info->shared_info()->is_function()) {
      result = parser.ParseLazy();
    } else {
      result = parser.ParseProgram();
    }
  } else {
    ScriptDataImpl* pre_data = info->pre_parse_data();
    Parser parser(info, parsing_flags, info->extension(), pre_data);
    if (pre_data != NULL && pre_data->has_error()) {
      // ...
    } else {
      result = parser.ParseProgram();
    }
  }
  info->SetFunction(result);
  return (result != NULL);
}

The Parser’s implementation is also part of parser.cc. The methods ParseProgram and  ParseLazy can be found in the beginning of the file (currently around line nr. 530). They both somehow walk over the given source code and build an Abstract Syntax Tree from it which is stored in a particular AST node, a FunctionLiteral. That is, a script gets an enclosing function which returns the result of the last statement. Script::Run will execute this function.

The Abstract Syntax Tree

As parsing the code and building the AST are interleaved, there is not much to say about the AST’s creation here. Instead I’ll explain its building blocks. An AST consists of AstNodes which belong to one of five node types.

To get a first understanding of the basic structure of JavaScript programs, it is sufficient to focus on Statements and Expressions. Probably everyone (if not more) who ever dealt with the theory of programming language’s syntax and semantics knows these concepts. Statements drive the control flow while Expressions mainly cause data flow.

A FunctionLiteral

As said before, scripts are transformed into FunctionLiterals, which are the root of the according ASTs. A FunctionLiteral has a body which is a list of Statements which correspond to the actual statements of the source code but transformed into a tree structure.

For an assignment in the source, this means it becomes an ExpressionStatement wrapping an Expression of type Assignment. An Assignment itself has two further Expressions as children, the left-hand side target and the right-hand side value. The nesting may be arbitrarily deep until an Expression is reached that itself has no child anymore.

One can inspect generated ASTs using the --print-ast flag for the debug shell D8. After building V8 it can be invoked from the main directory by out/x64.debug/d8 --print-ast foo.js. The following example shows the AST for an factorial function:

function factorial(n){
    var result;
    if (n == 1 || n == 0)
        result = 1;
    else
        result = n*factorial(n - 1);
    return result;
}
FUNC
. NAME "factorial"
. INFERRED NAME ""
. PARAMS
. . VAR (mode = VAR) "n"
. DECLS
. . VAR (mode = VAR) "result"
// actual beginning of the body
. BLOCK INIT
. IF
. . OR
. . . EQ
. . . . VAR PROXY parameter[0] (mode = VAR) "n"
. . . . LITERAL 1
. . . EQ
. . . . VAR PROXY parameter[0] (mode = VAR) "n"
. . . . LITERAL 0
. THEN
. . ASSIGN
. . . VAR PROXY local[0] (mode = VAR) "result"
. . . LITERAL 1
. ELSE
. . ASSIGN
. . . VAR PROXY local[0] (mode = VAR) "result"
. . . MUL
. . . . VAR PROXY parameter[0] (mode = VAR) "n"
. . . . CALL
. . . . . VAR PROXY (mode = DYNAMIC_GLOBAL) "factorial"
. . . . . SUB
. . . . . . VAR PROXY parameter[0] (mode = VAR) "n"
. . . . . . LITERAL 1
. RETURN
. . VAR PROXY local[0] (mode = VAR) "result"

Compilation

When the AST is ready, the next stage begins with MakeCode back in compiler.cc:

static bool MakeCode(CompilationInfo* info) {
  // Precondition: code has been parsed.  Postcondition: the code field in
  // the compilation info is set if compilation succeeded.
  ASSERT(info->function() != NULL);
  return Rewriter::Rewrite(info) && Scope::Analyze(info) && GenerateCode(info);
}

Rewrite

From what I observed, Rewriter::Rewrite in rewriter.cc simply travels through the AST to insert Return statements if there are none explicitly defined (in this case the result of the last Statement is returned).

Analyze

In JavaScript, Scopes are used to bind and look up variables. Each function gets its own Scope. When a function is called, its scope is chained with the calling scope. To look up a variable, V8 travels down the scope chain until it finds the variable. This way, variable shadowing naturally is covered by the used data structures.

What I believe Scope::Analyze does, is to record as much information about the encountered variables as statically possible to support the compilation process, i.e. allow speed-ups in the generated code. There possibly is more behind that as V8 has sophisticated optimization techniques – see this wingolog atricles for more: 2 compilers, Crankshaft and Lithium.

GenerateCode

When a function is compiled for the first time, this is business of the FullCodegen compiler (throughout runtime, V8 may decide to optimize certain functions and this is where I definitely would refer you to wingolog). GenerateCode calls its FullCodeGenerator::MakeCode method and from there on it gets platform specific, because the called Generate method is included from the platform-specific directories, e.g. here for x64, at compile time.

// beginning in line 289
bool FullCodeGenerator::MakeCode(CompilationInfo* info) {
  // ...
  FullCodeGenerator cgen(&masm, info);
  cgen.Generate();
  //..
  return !code.is_null();
}

 

To find out, how V8 exactly assembles the machine code, have fun and go on digging. For me this was the end of the journey as my interest is focused on the platform-independent parts of V8.

Comments (1)

mSeptember 17th, 2013 at 17:41

Really helpful article –

Leave a comment

Your comment

(required)