Rust作为一门新的系统语言,具有高性能、内存安全可靠、高效特性,越来越受到业界的关注及学习热潮。

learn rust from rustc(LRFRC)系列文章尝试从另外一个角度来学习Rust语言,通过了解编译器rustc基本原理来学习Rust语言,以便更好的理解和分析编译错误提示,更快的写出更好的Rust代码。
通过前面的文章了解Rust语言基础以及相关语法、生成AST语法树、遍历AST语法树、简易宏和过程宏、名称解析等之后,回顾LRFRC系列:快速入门rustc编译器概览中将AST转换成高级中间描述HIR部分的内容,rustc会实现将AST语法树转换成高级中间描述HIR,现对相关内容进行更详细解读;
一、高级中间描述HIR
1.何为高级中间描述HIR
高级中间描述HIR(High-Level Intermediate Representation)作为编译器rustc内部对源代码进行解析生成语法树AST之后另一个重要的中间描述,它用来更进一步面向编译器友好的中间描述,它大部分内容与面向语法的AST树节点类似,但它构建在宏扩展展开和名称解析之后,同时对一些特定语法树节点进行简化或去糖处理比如:去掉圆括号、for语句转换成loop+match+let语句、if let语句转换成match语句、impl Trait转换成带泛化参数的形式等;
另外HIR构建在名称解析之后,其中包含当前Crate对其他被引用Crate的元素的关联引用,同时为了方便查找,以Map的方式来存储当前Crate定义的所有Item元素和函数体Body的定义,并使用索引来访问具体内容;
2.为啥需要HIR
HIR面向编译器友好,方便编译器生成高效二进制代码,为后续编译器高效进行类型分析、类型推导、类型归一化、生成THIR/MIR作准备;
3.HIR主要数据结构d
A.HIR中的标识Id
有一组标识Id用来唯一标识HIR中的节点或定义,它们包括:DefId、LocalDefId、HirId.
一个DefId可用来标识任何一个crate中定义的元素,它由CrateNum和DefIndex索引共同组成;
一个LocalDefId可用来标识当前crate中定义的任何元素,它由DefIndex索引组成,加上默认的值为0的CrateNum组成一个DefId;
一个HirId可用来标识HIR中任何一个节点,它由包含的元素LocalDefId和该元素内的ItemLocalId索引共同组成;
其中DefId和LocalDefId用来标识crate中定义的元素item,不可用来标识表达式;
而HirId用来标识某一个元素item下不同的HIR节点,这样形成以Item为中心下的各种HIR节点具有顺序增长的索引;
这些标识Id与AST中NodeId的主要区别在于:
NodeId是整个AST树唯一的,而LocalDefId和HirId及其中的ItemLocalId,只是在指定Crate或指定Item下的唯一性;
NodeId与HirId的关联映射通过hir::Crate中的node_id_to_hir_id来维护;
// src/librustc_span/def_id.rs
pub const LOCAL_CRATE: CrateNum = CrateNum::Index(CrateId::from_u32(0));
pub enum CrateNum {
/// A special `CrateNum` that we use for the `tcx.rcache` when decoding from
/// the incr. comp. cache.
ReservedForIncrCompCache,
Index(CrateId),
}
/// A `DefId` identifies a particular *definition*, by combining a crate
/// index and a def index.
///
/// You can create a `DefId` from a `LocalDefId` using `local_def_id.to_def_id()`.
#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Hash, Copy)]
pub struct DefId {
pub krate: CrateNum,
pub index: DefIndex,
}
rustc_index::newtype_index! {
/// A DefIndex is an index into the hir-map for a crate, identifying a
/// particular definition. It should really be considered an interned
/// shorthand for a particular DefPath.
pub struct DefIndex {
ENCODABLE = custom // (only encodable in metadata)
DEBUG_FORMAT = "DefIndex({})",
/// The crate root is always assigned index 0 by the AST Map code,
/// thanks to `NodeCollector::new`.
const CRATE_DEF_INDEX = 0,
}
}
/// A LocalDefId is equivalent to a DefId with `krate == LOCAL_CRATE`.
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct LocalDefId {
pub local_def_index: DefIndex,
}
// src/librustc_hir/hir_id.rs
/// Uniquely identifies a node in the HIR of the current crate. It is
/// composed of the `owner`, which is the `LocalDefId` of the directly enclosing
/// `hir::Item`, `hir::TraitItem`, or `hir::ImplItem` (i.e., the closest "item-like"),
/// and the `local_id` which is unique within the given owner.
///
#[derive(Encodable, Decodable)]
pub struct HirId {
pub owner: LocalDefId,
pub local_id: ItemLocalId,
}
rustc_index::newtype_index! {
/// An `ItemLocalId` uniquely identifies something within a given "item-like";
pub struct ItemLocalId { .. }
}
/// The `HirId` corresponding to `CRATE_NODE_ID` and `CRATE_DEF_INDEX`.
pub const CRATE_HIR_ID: HirId = HirId {
owner: LocalDefId { local_def_index: CRATE_DEF_INDEX },
local_id: ItemLocalId::from_u32(0),
};
B.hir::Crate及Item
hir::Crate包含根CrateItem和内部包含的items、trait_items、impl_items、bodies等;
// src/librustc_hir/hir.rs
pub struct Crate<'hir> {
pub item: CrateItem<'hir>,
pub exported_macros: &'hir [MacroDef<'hir>],
// Attributes from non-exported macros, kept only for collecting the library feature list.
pub non_exported_macro_attrs: &'hir [Attribute],
// items使用HirId为key,Item为value的Map
pub items: BTreeMap<HirId, Item<'hir>>,
// trait_items
pub trait_items: BTreeMap<TraitItemId, TraitItem<'hir>>,
pub impl_items: BTreeMap<ImplItemId, ImplItem<'hir>>,
pub bodies: BTreeMap<BodyId, Body<'hir>>,
pub trait_impls: BTreeMap<DefId, Vec<HirId>>,
// 一组按顺序输出的bodyid
pub body_ids: Vec<BodyId>,
// 一组按顺序输出的moduels
pub modules: BTreeMap<HirId, ModuleItems>,
// 一组按顺序输出的过程宏
pub proc_macros: Vec<HirId>,
pub trait_map: BTreeMap<HirId, Vec<TraitCandidate>>,
}
// CrateItem包含一个根Mod及其attr、span.
pub struct CrateItem<'hir> {
pub module: Mod<'hir>,
pub attrs: &'hir [Attribute],
pub span: Span,
}
pub struct Mod<'hir> {
pub inner: Span,
// 只包含由ItemId/HirId组成的序列,具体Item内容存储在items map中;
pub item_ids: &'hir [ItemId],
}
pub struct ItemId {
pub id: HirId,
}
pub struct Item<'hir> {
pub ident: Ident,
pub hir_id: HirId,
pub attrs: &'hir [Attribute],
pub kind: ItemKind<'hir>,
pub vis: Visibility<'hir>,
pub span: Span,
}
pub enum ItemKind<'hir> {
/// An `extern crate` item
///
/// E.g., `extern crate foo` or `extern crate foo_bar as foo`.
ExternCrate(Option<Symbol>),
/// `use foo::bar::*;` or `use foo::bar::baz as quux;`
///
/// or just
///
/// `use foo::bar::baz;` (with `as baz` implicitly on the right).
Use(&'hir Path<'hir>, UseKind),
/// A `static` item.
Static(&'hir Ty<'hir>, Mutability, BodyId),
/// A `const` item.
Const(&'hir Ty<'hir>, BodyId),
/// A function declaration.
Fn(FnSig<'hir>, Generics<'hir>, BodyId),
/// A module.
Mod(Mod<'hir>),
/// An external module, e.g. `extern { .. }`.
ForeignMod(ForeignMod<'hir>),
/// Module-level inline assembly (from `global_asm!`).
GlobalAsm(&'hir GlobalAsm),
/// A type alias, e.g., `type Foo = Bar<u8>`.
TyAlias(&'hir Ty<'hir>, Generics<'hir>),
/// An opaque `impl Trait` type alias, e.g., `type Foo = impl Bar;`.
OpaqueTy(OpaqueTy<'hir>),
/// An enum definition, e.g., `enum Foo<A, B> {C<A>, D<B>}`.
Enum(EnumDef<'hir>, Generics<'hir>),
/// A struct definition, e.g., `struct Foo<A> {x: A}`.
Struct(VariantData<'hir>, Generics<'hir>),
/// A union definition, e.g., `union Foo<A, B> {x: A, y: B}`.
Union(VariantData<'hir>, Generics<'hir>),
/// A trait definition.
Trait(IsAuto, Unsafety, Generics<'hir>, GenericBounds<'hir>, &'hir [TraitItemRef]),
/// A trait alias.
TraitAlias(Generics<'hir>, GenericBounds<'hir>),
/// An implementation, e.g., `impl<A> Trait for Foo { .. }`.
Impl {
unsafety: Unsafety,
polarity: ImplPolarity,
defaultness: Defaultness,
defaultness_span: Option<Span>,
constness: Constness,
generics: Generics<'hir>,
/// The trait being implemented, if any.
of_trait: Option<TraitRef<'hir>>,
self_ty: &'hir Ty<'hir>,
items: &'hir [ImplItemRef<'hir>],
},
}
pub struct BodyId {
pub hir_id: HirId,
}
Body不仅仅包含参数,还包含一个表达式<往往为一个块表达式>,和生成模式;
Block中包含Stmt,Stmt中包含Local、Item、Semi、Expr等类型数据;
pub struct Body<'hir> {
pub params: &'hir [Param<'hir>],
pub value: Expr<'hir>,
pub generator_kind: Option<GeneratorKind>,
}
pub struct Expr<'hir> {
pub hir_id: HirId,
pub kind: ExprKind<'hir>,
pub attrs: AttrVec,
pub span: Span,
}
pub enum ExprKind<'hir> {
/// A `box x` expression.
Box(&'hir Expr<'hir>),
/// An array (e.g., `[a, b, c, d]`).
Array(&'hir [Expr<'hir>]),
/// A function call.
/// The first field resolves to the function itself(usually an `ExprKind::Path`),
/// and the second field is the list of arguments.
Call(&'hir Expr<'hir>, &'hir [Expr<'hir>]),
/// A method call (e.g., `x.foo::<'static, Bar, Baz>(a, b, c, d)`).
MethodCall(&'hir PathSegment<'hir>, Span, &'hir [Expr<'hir>], Span),
/// A tuple (e.g., `(a, b, c, d)`).
Tup(&'hir [Expr<'hir>]),
/// A binary operation (e.g., `a + b`, `a * b`).
Binary(BinOp, &'hir Expr<'hir>, &'hir Expr<'hir>),
/// A unary operation (e.g., `!x`, `*x`).
Unary(UnOp, &'hir Expr<'hir>),
/// A literal (e.g., `1`, `"foo"`).
Lit(Lit),
/// A cast (e.g., `foo as f64`).
Cast(&'hir Expr<'hir>, &'hir Ty<'hir>),
/// A type reference (e.g., `Foo`).
Type(&'hir Expr<'hir>, &'hir Ty<'hir>),
/// Wraps the expression in a terminating scope.
DropTemps(&'hir Expr<'hir>),
/// A conditionless loop
/// (can be exited with `break`, `continue`, or `return`).
Loop(&'hir Block<'hir>, Option<Label>, LoopSource),
/// A `match` block, with a source that indicates whether or not it is
/// the result of a desugaring, and if so, which kind.
Match(&'hir Expr<'hir>, &'hir [Arm<'hir>], MatchSource),
/// A closure (e.g., `move |a, b, c| {a + b + c}`).
Closure(CaptureBy, &'hir FnDecl<'hir>, BodyId, Span, Option<Movability>),
/// A block (e.g., `'label: { ... }`).
Block(&'hir Block<'hir>, Option<Label>),
/// An assignment (e.g., `a = foo()`).
Assign(&'hir Expr<'hir>, &'hir Expr<'hir>, Span),
/// An assignment with an operator.E.g., `a += 1`.
AssignOp(BinOp, &'hir Expr<'hir>, &'hir Expr<'hir>),
/// Access of a named (e.g., `obj.foo`) or
/// unnamed (e.g., `obj.0`) struct or tuple field.
Field(&'hir Expr<'hir>, Ident),
/// An indexing operation (`foo[2]`).
Index(&'hir Expr<'hir>, &'hir Expr<'hir>),
/// Path to a definition, possibly containing lifetime or type parameters.
Path(QPath<'hir>),
/// A referencing operation (i.e., `&a` or `&mut a`).
AddrOf(BorrowKind, Mutability, &'hir Expr<'hir>),
/// A `break`, with an optional label to break.
Break(Destination, Option<&'hir Expr<'hir>>),
/// A `continue`, with an optional label.
Continue(Destination),
/// A `return`, with an optional value to be returned.
Ret(Option<&'hir Expr<'hir>>),
/// Inline assembly (from `asm!`), with its outputs and inputs.
InlineAsm(&'hir InlineAsm<'hir>),
/// Inline assembly (from `llvm_asm!`), with its outputs and inputs.
LlvmInlineAsm(&'hir LlvmInlineAsm<'hir>),
/// A struct or struct-like variant literal expression.
/// E.g., `Foo {x: 1, y: 2}`, or `Foo {x: 1, .. base}`,
/// where `base` is the `Option<Expr>`.
Struct(&'hir QPath<'hir>, &'hir [Field<'hir>], Option<&'hir Expr<'hir>>),
/// An array literal constructed from one repeated element.
/// E.g., `[1; 5]`.
Repeat(&'hir Expr<'hir>, AnonConst),
/// A suspension point for generators (i.e., `yield <expr>`).
Yield(&'hir Expr<'hir>, YieldSource),
/// A placeholder for an expression
/// that wasn't syntactically well formed in some way.
Err,
}
/// A statement.
#[derive(Debug, HashStable_Generic)]
pub struct Stmt<'hir> {
pub hir_id: HirId,
pub kind: StmtKind<'hir>,
pub span: Span,
}
/// The contents of a statement.
#[derive(Debug, HashStable_Generic)]
pub enum StmtKind<'hir> {
/// A local (`let`) binding.
Local(&'hir Local<'hir>),
/// An item binding.
Item(ItemId),
/// An expression without a trailing semi-colon (must have unit type).
Expr(&'hir Expr<'hir>),
/// An expression with a trailing semi-colon (may have any type).
Semi(&'hir Expr<'hir>),
}
4.HIR打印
使用下列调试选项dump出带有不同内容的hir;
使用-Z unpretty=hir 可dump出hir内容; 使用-Z unpretty=hir,identified 可dump出带有标识id的hir内容; 使用-Z unpretty=hir,typed 可dump出带有完成类型的hir内容; 使用-Z unpretty=hir-tree 可dump出原生的hir内容;
二、从AST生成HIR
1.触发生成HIR
根据LRFRC系列:快速入门rustc编译器概览中初识rustc编译主流程部分的run_compiler代码, 发现调用queries::expansion()后,其中会触发queries.global_ctxt(), 其中会调用lower_to_hir,进而会调用rustc_ast_lowering::lower_crate方法, 其返回hir::Crate和ResolverOutputs对象,返回值用来创建GlobalContext;
摘要代码如下:
// src/librustc_driver/lib.rs
pub fn run_compiler(
at_args: &[String],
callbacks: &mut (dyn Callbacks + Send),
file_loader: Option<Box<dyn FileLoader + Send + Sync>>,
emitter: Option<Box<dyn Write + Send>>,
) -> interface::Result<()> {
// ..............................
interface::run_compiler(config, |compiler| {
let sess = compiler.session();
let linker = compiler.enter(|queries| {
// 调用queries::parse
queries.parse()?;
// 省略与输出解析结果编译选项的相关逻辑
// 进行名称解析和宏扩展,并检查结果是否正常,否则直接退出
queries.expansion()?;
if callbacks.after_expansion(compiler, queries) ==
Compilation::Stop {
return early_exit();
}
queries.prepare_outputs()?;
// 在expansion及prepare_outputs之后,生成global_ctxt;
queries.global_ctxt()?;
// .....................................
}
// ..................
}
// src/librustc_interface/queries.rs
pub fn global_ctxt(&'tcx self) -> Result<&Query<QueryContext<'tcx>>> {
self.global_ctxt.compute(|| {
let crate_name = self.crate_name()?.peek().clone();
let outputs = self.prepare_outputs()?.peek().clone();
let lint_store = self.expansion()?.peek().2.clone();
// 将AST转换成HIR
let hir = self.lower_to_hir()?.peek();
let dep_graph = self.dep_graph()?.peek().clone();
let (ref krate, ref resolver_outputs) = &*hir;
let _timer = self.session().timer("create_global_ctxt");
Ok(passes::create_global_ctxt(
self.compiler,
lint_store,
krate,
dep_graph,
resolver_outputs.steal(),
outputs,
&crate_name,
&self.gcx,
&self.arena,
))
})
}
pub fn lower_to_hir(&'tcx self) -> Result<&Query<(&'tcx Crate<'tcx>
, Steal<ResolverOutputs>)>> {
self.lower_to_hir.compute(|| {
let expansion_result = self.expansion()?;
let peeked = expansion_result.peek();
let krate = &peeked.0;
let resolver = peeked.1.steal();
let lint_store = &peeked.2;
//使用前面宏展开后的resolver和ast::Crate来进行lower_to_hir
let hir = resolver.borrow_mut().access(|resolver| {
Ok(passes::lower_to_hir(
self.session(),
lint_store,
resolver,
&*self.dep_graph()?.peek(),
&krate,
&self.hir_arena,
))
})?;
let hir = self.hir_arena.alloc(hir);
Ok((hir, Steal::new(BoxedResolver::to_resolver_outputs(resolver))))
})
}
// src/librustc_interface/passes.rs
pub fn lower_to_hir<'res, 'tcx>(
sess: &'tcx Session,
lint_store: &LintStore,
resolver: &'res mut Resolver<'_>,
dep_graph: &'res DepGraph,
krate: &'res ast::Crate,
arena: &'tcx rustc_ast_lowering::Arena<'tcx>,
) -> Crate<'tcx> {
// 省略其他部分代码逻辑
// Lower AST to HIR.
let hir_crate = rustc_ast_lowering::lower_crate(
sess,
&krate,
resolver,
rustc_parse::nt_to_tokenstream,
arena,
);
// 省略其他部分代码逻辑
hir_crate
}
2.ast_lowing中lower_crate
ast_lowing首先生成一个LoweringContext对象,其中定义一组指定数据结构的数据,然后调用其lower_crate方法;
// src/librustc_ast_lowering/lib.rs
pub fn lower_crate<'a, 'hir>(
sess: &'a Session,
krate: &'a Crate,
resolver: &'a mut dyn ResolverAstLowering,
nt_to_tokenstream: NtToTokenstream,
arena: &'hir Arena<'hir>,
) -> hir::Crate<'hir> {
let _prof_timer = sess.prof.verbose_generic_activity("hir_lowering");
LoweringContext {
sess,
resolver,
nt_to_tokenstream,
arena,
items: BTreeMap::new(),
trait_items: BTreeMap::new(),
impl_items: BTreeMap::new(),
bodies: BTreeMap::new(),
trait_impls: BTreeMap::new(),
modules: BTreeMap::new(),
exported_macros: Vec::new(),
non_exported_macro_attrs: Vec::new(),
catch_scopes: Vec::new(),
loop_scopes: Vec::new(),
is_in_loop_condition: false,
is_in_trait_impl: false,
is_in_dyn_type: false,
anonymous_lifetime_mode: AnonymousLifetimeMode::PassThrough,
type_def_lifetime_params: Default::default(),
current_module: hir::CRATE_HIR_ID,
current_hir_id_owner: vec![(LocalDefId { local_def_index: CRATE_DEF_INDEX }, 0)],
// 用来维护不同item下的ItemLocalId
item_local_id_counters: Default::default(),
// ast node id到hir id映射关联
node_id_to_hir_id: IndexVec::new(),
generator_kind: None,
task_context: None,
current_item: None,
lifetimes_to_define: Vec::new(),
is_collecting_in_band_lifetimes: false,
in_scope_lifetimes: Vec::new(),
allow_try_trait: Some([sym::try_trait][..].into()),
allow_gen_future: Some([sym::gen_future][..].into()),
}
.lower_crate(krate)
}
// lower_crate方法通过遍历ast::Crate结合resolver,来生成对应HirId以及hir::Crate/Item/Expr等, 并将其存储在items、trait_items、impl_items、bodies、trait_impls、modules中, 并且保证不同item内容的访问可通过Id来访问;
结合前面对hir::Crate和Item等的定义,加上ItemLowerer遍历器的实现,可以比较直观的理解从AST树节点到HIR的转换, 具体实现可进一步参考相关代码;
impl<'a, 'hir> LoweringContext<'a, 'hir> {
fn lower_crate(mut self, c: &Crate) -> hir::Crate<'hir> {
// 省略MiscCollector定义,它作为AST Visitor来遍历树节点
// 从AST CRATE_NODE_ID 0开始转换,维护根HirId
self.lower_node_id(CRATE_NODE_ID);
debug_assert!(self.node_id_to_hir_id[CRATE_NODE_ID] == Some(hir::CRATE_HIR_ID));
// 使用MiscCollector遍历ast::Crate维护nodeid与对应localdefid
visit::walk_crate(&mut MiscCollector { lctx: &mut self, hir_id_owner: None }, c);
// 使用item::ItemLowerer遍历ast::Crate实现从AST树节点到HIR节点的转换;
visit::walk_crate(&mut item::ItemLowerer { lctx: &mut self }, c);
// 使用ast::Mod的span及items,生成hir::Mod
let module = self.lower_mod(&c.module);
// 生成crate attrs
let attrs = self.lower_attrs(&c.attrs);
// 生成body_ids
let body_ids = body_ids(&self.bodies);
let proc_macros =
c.proc_macros.iter().map(|id| self.node_id_to_hir_id[*id].unwrap()).collect();
let trait_map = self
.resolver
.trait_map()
.iter()
.filter_map(|(&k, v)| {
self.node_id_to_hir_id.get(k).and_then(|id| id.as_ref()).map(|id| (*id, v.clone()))
})
.collect();
// 生成def_id到hir_id的映射
let mut def_id_to_hir_id = IndexVec::default();
for (node_id, hir_id) in self.node_id_to_hir_id.into_iter_enumerated() {
if let Some(def_id) = self.resolver.opt_local_def_id(node_id) {
if def_id_to_hir_id.len() <= def_id.index() {
def_id_to_hir_id.resize(def_id.index() + 1, None);
}
def_id_to_hir_id[def_id] = hir_id;
}
}
// 将生成的def_id/hir_id映射设置到resolver中;
self.resolver.definitions().init_def_id_to_hir_id_mapping(def_id_to_hir_id);
// 根据上面生成的内容创建hir::Crate
hir::Crate {
item: hir::CrateItem { module, attrs, span: c.span },
exported_macros: self.arena.alloc_from_iter(self.exported_macros),
non_exported_macro_attrs: self.arena.alloc_from_iter(self.non_exported_macro_attrs),
items: self.items,
trait_items: self.trait_items,
impl_items: self.impl_items,
bodies: self.bodies,
body_ids,
trait_impls: self.trait_impls,
modules: self.modules,
proc_macros,
trait_map,
}
}
//...................................
}
三、总结及其他
通过介绍HIR相关标识Id以及hir::Crate、Item、Mod、Expr等数据结构,还有lower_crate方法的实现,
可大致理解编译器将AST树转换成一组基于Map类型的HIR数据描述的主要流程,为后续高效的类型推导、分析、归一等作好准备;
参考
更多文章可使用微信扫公众号二维码查看
