Parser

Struct Parser

pub struct Parser { /* private fields */ }

Expand description

An incremental parser of a binary WebAssembly module or component.

This type is intended to be used to incrementally parse a WebAssembly module or component as bytes become available for the module. This can also be used to parse modules or components that are already entirely resident within memory.

This primary function for a parser is the Parser::parse function which will incrementally consume input. You can also use the Parser::parse_all function to parse a module or component that is entirely resident in memory.

Implementations§

impl Parser

pub fn new(offset: u64) -> Parser

Creates a new parser.

Reports errors and ranges relative to offset provided, where offset is some logical offset within the input stream that we’re parsing.

pub fn is_core_wasm(bytes: &[u8]) -> bool

Tests whether bytes looks like a core WebAssembly module.

This will inspect the first 8 bytes of bytes and return true if it starts with the standard core WebAssembly header.

pub fn is_component(bytes: &[u8]) -> bool

Tests whether bytes looks like a WebAssembly component.

This will inspect the first 8 bytes of bytes and return true if it starts with the standard WebAssembly component header.

pub fn features(&self) -> WasmFeatures

Returns the currently active set of wasm features that this parser is using while parsing.

The default set of features is WasmFeatures::all() for new parsers.

For more information see BinaryReader::new.

pub fn set_features(&mut self, features: WasmFeatures)

Sets the wasm features active while parsing to the features specified.

The default set of features is WasmFeatures::all() for new parsers.

For more information see BinaryReader::new.

pub fn offset(&self) -> u64

Returns the original offset that this parser is currently at.

pub fn parse<'a>(&mut self, data: &'a [u8], eof: bool) -> Result<Chunk<'a>>

Attempts to parse a chunk of data.

This method will attempt to parse the next incremental portion of a WebAssembly binary. Data available for the module or component is provided as data, and the data can be incomplete if more data has yet to arrive. The eof flag indicates whether more data will ever be received.

There are two ways parsing can succeed with this method:

Chunk::NeedMoreData - this indicates that there is not enough bytes in data to parse a payload. The caller needs to wait for more data to be available in this situation before calling this method again. It is guaranteed that this is only returned if eof is false.
Chunk::Parsed - this indicates that a chunk of the input was successfully parsed. The payload is available in this variant of what was parsed, and this also indicates how many bytes of data was consumed. It’s expected that the caller will not provide these bytes back to the Parser again.

Note that all Chunk return values are connected, with a lifetime, to the input buffer. Each parsed chunk borrows the input buffer and is a view into it for successfully parsed chunks.

It is expected that you’ll call this method until Payload::End is reached, at which point you’re guaranteed that the parse has completed. Note that complete parsing, for the top-level module or component, implies that data is empty and eof is true.

§Errors

Parse errors are returned as an Err. Errors can happen when the structure of the data is unexpected or if sections are too large for example. Note that errors are not returned for malformed contents of sections here. Sections are generally not individually parsed and each returned Payload needs to be iterated over further to detect all errors.

§Examples

An example of reading a wasm file from a stream (std::io::Read) and incrementally parsing it.

use std::io::Read;
use anyhow::Result;
use wasmparser::{Parser, Chunk, Payload::*};

fn parse(mut reader: impl Read) -> Result<()> {
    let mut buf = Vec::new();
    let mut cur = Parser::new(0);
    let mut eof = false;
    let mut stack = Vec::new();

    loop {
        let (payload, consumed) = match cur.parse(&buf, eof)? {
            Chunk::NeedMoreData(hint) => {
                assert!(!eof); // otherwise an error would be returned

                // Use the hint to preallocate more space, then read
                // some more data into our buffer.
                //
                // Note that the buffer management here is not ideal,
                // but it's compact enough to fit in an example!
                let len = buf.len();
                buf.extend((0..hint).map(|_| 0u8));
                let n = reader.read(&mut buf[len..])?;
                buf.truncate(len + n);
                eof = n == 0;
                continue;
            }

            Chunk::Parsed { consumed, payload } => (payload, consumed),
        };

        match payload {
            // Sections for WebAssembly modules
            Version { .. } => { /* ... */ }
            TypeSection(_) => { /* ... */ }
            ImportSection(_) => { /* ... */ }
            FunctionSection(_) => { /* ... */ }
            TableSection(_) => { /* ... */ }
            MemorySection(_) => { /* ... */ }
            TagSection(_) => { /* ... */ }
            GlobalSection(_) => { /* ... */ }
            ExportSection(_) => { /* ... */ }
            StartSection { .. } => { /* ... */ }
            ElementSection(_) => { /* ... */ }
            DataCountSection { .. } => { /* ... */ }
            DataSection(_) => { /* ... */ }

            // Here we know how many functions we'll be receiving as
            // `CodeSectionEntry`, so we can prepare for that, and
            // afterwards we can parse and handle each function
            // individually.
            CodeSectionStart { .. } => { /* ... */ }
            CodeSectionEntry(body) => {
                // here we can iterate over `body` to parse the function
                // and its locals
            }

            // Sections for WebAssembly components
            InstanceSection(_) => { /* ... */ }
            CoreTypeSection(_) => { /* ... */ }
            ComponentInstanceSection(_) => { /* ... */ }
            ComponentAliasSection(_) => { /* ... */ }
            ComponentTypeSection(_) => { /* ... */ }
            ComponentCanonicalSection(_) => { /* ... */ }
            ComponentStartSection { .. } => { /* ... */ }
            ComponentImportSection(_) => { /* ... */ }
            ComponentExportSection(_) => { /* ... */ }

            ModuleSection { parser, .. }
            | ComponentSection { parser, .. } => {
                stack.push(cur.clone());
                cur = parser.clone();
            }

            CustomSection(_) => { /* ... */ }

            // Once we've reached the end of a parser we either resume
            // at the parent parser or we break out of the loop because
            // we're done.
            End(_) => {
                if let Some(parent_parser) = stack.pop() {
                    cur = parent_parser;
                } else {
                    break;
                }
            }

            // most likely you'd return an error here
            _ => { /* ... */ }
        }

        // once we're done processing the payload we can forget the
        // original.
        buf.drain(..consumed);
    }

    Ok(())
}

pub fn parse_all(self, data: &[u8]) -> impl Iterator<Item = Result<Payload<'_>>>

Convenience function that can be used to parse a module or component that is entirely resident in memory.

This function will parse the data provided as a WebAssembly module or component.

Note that when this function yields sections that provide parsers, no further action is required for those sections as payloads from those parsers will be automatically returned.

§Examples

An example of reading a wasm file from a stream (std::io::Read) into a buffer and then parsing it.

use std::io::Read;
use anyhow::Result;
use wasmparser::{Parser, Chunk, Payload::*};

fn parse(mut reader: impl Read) -> Result<()> {
    let mut buf = Vec::new();
    reader.read_to_end(&mut buf)?;
    let parser = Parser::new(0);

    for payload in parser.parse_all(&buf) {
        match payload? {
            // Sections for WebAssembly modules
            Version { .. } => { /* ... */ }
            TypeSection(_) => { /* ... */ }
            ImportSection(_) => { /* ... */ }
            FunctionSection(_) => { /* ... */ }
            TableSection(_) => { /* ... */ }
            MemorySection(_) => { /* ... */ }
            TagSection(_) => { /* ... */ }
            GlobalSection(_) => { /* ... */ }
            ExportSection(_) => { /* ... */ }
            StartSection { .. } => { /* ... */ }
            ElementSection(_) => { /* ... */ }
            DataCountSection { .. } => { /* ... */ }
            DataSection(_) => { /* ... */ }

            // Here we know how many functions we'll be receiving as
            // `CodeSectionEntry`, so we can prepare for that, and
            // afterwards we can parse and handle each function
            // individually.
            CodeSectionStart { .. } => { /* ... */ }
            CodeSectionEntry(body) => {
                // here we can iterate over `body` to parse the function
                // and its locals
            }

            // Sections for WebAssembly components
            ModuleSection { .. } => { /* ... */ }
            InstanceSection(_) => { /* ... */ }
            CoreTypeSection(_) => { /* ... */ }
            ComponentSection { .. } => { /* ... */ }
            ComponentInstanceSection(_) => { /* ... */ }
            ComponentAliasSection(_) => { /* ... */ }
            ComponentTypeSection(_) => { /* ... */ }
            ComponentCanonicalSection(_) => { /* ... */ }
            ComponentStartSection { .. } => { /* ... */ }
            ComponentImportSection(_) => { /* ... */ }
            ComponentExportSection(_) => { /* ... */ }

            CustomSection(_) => { /* ... */ }

            // Once we've reached the end of a parser we either resume
            // at the parent parser or the payload iterator is at its
            // end and we're done.
            End(_) => {}

            // most likely you'd return an error here, but if you want
            // you can also inspect the raw contents of unknown sections
            other => {
                match other.as_section() {
                    Some((id, range)) => { /* ... */ }
                    None => { /* ... */ }
                }
            }
        }
    }

    Ok(())
}

pub fn skip_section(&mut self)

Skip parsing the code section entirely.

This function can be used to indicate, after receiving CodeSectionStart, that the section will not be parsed.

The caller will be responsible for skipping size bytes (found in the CodeSectionStart payload). Bytes should only be fed into parse after the size bytes have been skipped.

§Panics

This function will panic if the parser is not in a state where it’s parsing the code section.

§Examples

use wasmparser::{Result, Parser, Chunk, Payload::*};
use core::ops::Range;

fn objdump_headers(mut wasm: &[u8]) -> Result<()> {
    let mut parser = Parser::new(0);
    loop {
        let payload = match parser.parse(wasm, true)? {
            Chunk::Parsed { consumed, payload } => {
                wasm = &wasm[consumed..];
                payload
            }
            // this state isn't possible with `eof = true`
            Chunk::NeedMoreData(_) => unreachable!(),
        };
        match payload {
            TypeSection(s) => print_range("type section", &s.range()),
            ImportSection(s) => print_range("import section", &s.range()),
            // .. other sections

            // Print the range of the code section we see, but don't
            // actually iterate over each individual function.
            CodeSectionStart { range, size, .. } => {
                print_range("code section", &range);
                parser.skip_section();
                wasm = &wasm[size as usize..];
            }
            End(_) => break,
            _ => {}
        }
    }
    Ok(())
}

fn print_range(section: &str, range: &Range<usize>) {
    println!("{:>40}: {:#010x} - {:#010x}", section, range.start, range.end);
}

Trait Implementations§

impl Clone for Parser

fn clone(&self) -> Parser

Returns a duplicate of the value. Read more

1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

impl Debug for Parser

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

impl Default for Parser

fn default() -> Parser

Returns the “default value” for a type. Read more

Auto Trait Implementations§

impl Freeze for Parser

impl RefUnwindSafe for Parser

impl Send for Parser

impl Sync for Parser

impl Unpin for Parser

impl UnwindSafe for Parser

Blanket Implementations§

impl<T> Any for T
where T: 'static + ?Sized,

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

impl<T> Borrow<T> for T
where T: ?Sized,

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

impl<T> BorrowMut<T> for T
where T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

impl<T> CloneToUninit for T
where T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dest. Read more

impl<T> From<T> for T

fn from(t: T) -> T

Returns the argument unchanged.

impl<T, U> Into<U> for T
where U: From<T>,

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

impl<T> ToOwned for T
where T: Clone,

type Owned = T

The resulting type after obtaining ownership.

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more

impl<T, U> TryFrom<U> for T
where U: Into<T>,

type Error = Infallible

The type returned in the event of a conversion error.

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.