core#

EBMLite: A lightweight EBML parsing library. It is designed to crawl through EBML files quickly and efficiently, and that’s about it.

class ebmlite.core.BinaryElement(stream=None, offset=0, size=0, payloadOffset=0)#

Base class for an EBML ‘binary’ element. Schema-specific subclasses are generated when a Schema is loaded.

class ebmlite.core.DateElement(stream=None, offset=0, size=0, payloadOffset=0)#

Base class for an EBML ‘date’ element. Schema-specific subclasses are generated when a Schema is loaded.

dtype#

alias of datetime

classmethod encodePayload(data, length=None)#

Type-specific payload encoder for date elements.

Return type:

bytes

parse(stream, size)#

Type-specific helper function for parsing the element’s payload. It is assumed the file pointer is at the start of the payload.

Return type:

datetime

class ebmlite.core.Document(stream, name=None, size=None, headers=True)#

Base class for an EBML document, containing multiple ‘root’ elements. Loading a Schema generates a subclass.

close()#

Closes the EBML file. If the Document was created using a file/stream (as opposed to a filename), the source file/stream is not closed.

classmethod encode(stream, data, headers=False, **kwargs)#

Encode an EBML document.

Parameters:
  • stream (BinaryIO)

  • data (Union[Dict[str, Any], List[Tuple[str, Any]]]) – The data to encode, provided as a dictionary keyed by element name, or a list of two-item name/value tuples. Note: individual items in a list of name/value pairs must be tuples!

  • headers (bool) – If True, include the standard EBML header element.

Returns:

A bytearray containing the encoded EBML binary.

gc(recurse=False)#

Clear any cached values. To save memory and/or force values to be re-read from the file.

Return type:

int

property type: str#

The document’s type name (i.e. the EBML DocType).

Return type:

str

property value#

An iterator for iterating the document’s root elements. Same as Document.__iter__().

property version: int#

The document’s type version (i.e. the EBML DocTypeVersion).

Return type:

int

class ebmlite.core.Element(stream=None, offset=0, size=0, payloadOffset=0)#

Base class for all EBML elements. Each data type has its own subclass, and these subclasses get subclassed when a Schema is read.

Variables:
  • id – The element’s EBML ID.

  • name – The element’s name.

  • schema – The Schema to which this element belongs.

  • multiple – Can this element appear multiple times? Note: Currently only enforced for encoding.

  • mandatory – Must this element appear in all EBML files using this element’s schema? Note: Not currently enforced.

  • children – A list of valid child element types. Only applicable to Document and Master subclasses. Note: Not currently enforced; only used when decoding ‘infinite’ length elements.

  • dtype – The element’s native Python data type.

  • precache – If True, the Element’s value is read when the Element is parsed. if False, the value is lazy-loaded when needed. Numeric element types default to True. Can be used to reduce the number of file seeks, potentially speeding things up.

  • length – An explicit length (in bytes) of the element when encoding. None will use standard EBML variable-length encoding.

dtype#

alias of bytearray

dump()#

Dump this element’s value as nested dictionaries, keyed by element name. For non-master elements, this just returns the element’s value; this method exists to maintain uniformity.

classmethod encode(value, length=None, lengthSize=None, infinite=False)#

Encode an EBML element.

Parameters:
  • value (Any) – The value to encode, or a list of values to encode. If a list is provided, each item will be encoded as its own element.

  • length (Optional[int]) – An explicit length for the encoded data, overriding the variable length encoding. For producing byte-aligned structures.

  • lengthSize (Optional[int]) – An explicit length for the encoded element size, overriding the variable length encoding.

  • infinite (bool) – If True, the element will be marked as being ‘infinite’. Infinite elements are read until an element is encountered that is not defined as a valid child in the schema.

Return type:

bytes

Returns:

A bytearray containing the encoded EBML data.

classmethod encodePayload(data, length=None)#

Type-specific payload encoder.

Return type:

bytes

gc(recurse=False)#

Clear any cached values. To save memory and/or force values to be re-read from the file. Returns the number of cached values cleared.

Return type:

int

getRaw()#

Get the element’s raw binary data, including EBML headers.

Return type:

bytes

getRawValue()#

Get the raw binary of the element’s value.

Return type:

bytes

parse(stream, size)#

Type-specific helper function for parsing the element’s payload. It is assumed the file pointer is at the start of the payload.

property value#

Parse and cache the element’s value.

class ebmlite.core.FloatElement(stream=None, offset=0, size=0, payloadOffset=0)#

Base class for an EBML floating point element. Schema-specific subclasses are generated when a Schema is loaded.

dtype#

alias of float

classmethod encodePayload(data, length=None)#

Type-specific payload encoder for floating point elements.

Return type:

bytes

parse(stream, size)#

Type-specific helper function for parsing the element’s payload. It is assumed the file pointer is at the start of the payload.

Return type:

float

class ebmlite.core.IntegerElement(stream=None, offset=0, size=0, payloadOffset=0)#

Base class for an EBML signed integer element. Schema-specific subclasses are generated when a Schema is loaded.

dtype#

alias of int

classmethod encodePayload(data, length=None)#

Type-specific payload encoder for signed integer elements.

Return type:

bytes

parse(stream, size)#

Type-specific helper function for parsing the element’s payload. It is assumed the file pointer is at the start of the payload.

Return type:

int

class ebmlite.core.MasterElement(stream=None, offset=0, size=0, payloadOffset=0)#

Base class for an EBML ‘master’ element, a container for other elements.

dtype#

alias of list

dump()#

Dump this element’s value as nested dictionaries, keyed by element name. The values of ‘multiple’ elements return as lists. Note: The order of ‘multiple’ elements relative to other elements will be lost; a file containing elements A1 B1 A2 B2 A3 B3 will result in``[A1 A2 A3][B1 B2 B3]``.

Todo:

Decide if this should be in the util submodule. It is very specific, and it isn’t totally necessary for the core library.

Return type:

Dict[str, Any]

classmethod encode(data, length=None, lengthSize=None, infinite=False)#

Encode an EBML master element.

Parameters:
  • data (Union[Dict[str, Any], List[Tuple[str, Any]]]) – The data to encode, provided as a dictionary keyed by element name, a list of two-item name/value tuples, or a list of either. Note: individual items in a list of name/value pairs must be tuples!

  • length (Optional[int]) – An explicit length for the encoded data, overriding the variable length encoding. For producing byte-aligned structures.

  • lengthSize (Optional[int]) – An explicit length for the encoded element size, overriding the variable length encoding.

  • infinite (bool) – If True, the element will be written with an undefined size. When parsed, its end will be determined by the occurrence of an invalid child element (or end-of-file).

Return type:

bytes

Returns:

A bytearray containing the encoded EBML binary.

classmethod encodePayload(data, length=None)#

Type-specific payload encoder for ‘master’ elements.

gc(recurse=False)#

Clear any cached values. To save memory and/or force values to be re-read from the file.

Return type:

int

parse(*args)#

Type-specific helper function for parsing the element’s payload. This is a special case; parameters stream and size are not used.

Return type:

List[Element]

parseElement(stream, nocache=False)#

Read the next element from a stream, instantiate a MasterElement object, and then return it and the offset of the next element (this element’s position + size).

Parameters:
  • stream (BinaryIO) – The source file-like stream.

  • nocache (bool) – If True, the parsed element’s precache attribute is ignored, and the element’s value will not be cached. For faster iteration when the element value doesn’t matter (e.g. counting child elements).

Return type:

Tuple[Element, int]

Returns:

The parsed element and the offset of the next element (i.e. the end of the parsed element).

property size: int#

The element’s size. Master elements can be instantiated with this as None; this denotes an ‘infinite’ EBML element, and its size will be determined by iterating over its contents until an invalid child type is found, or the end-of-file is reached.

Return type:

int

property value: List[Element]#

Parse and cache the element’s value.

Return type:

List[Element]

class ebmlite.core.Schema(source, name=None)#

An EBML schema, mapping element IDs to names and data types. Unlike the document and element types, this is not a base class; all schemata are actual instances of this class.

Schema instances are typically created by loading and XML schema file using loadSchema() or a byte string using parseSchema().

Variables:
  • document – The schema’s Document subclass.

  • elements – A dictionary mapping element IDs to the schema’s corresponding Element subclasses.

  • elementsByName – A dictionary mapping element names to the schema’s corresponding Element subclasses.

  • elementInfo – A dictionary mapping IDs to the raw schema attribute data. It may have additional items not present in the created element class’ attributes.

  • UNKNOWN – A class/function that handles unknown element IDs. By default, this is the UnknownElement class. Special-case handling can be done by substituting a different class, or an element-producing factory function.

  • source – The source from which the Schema was loaded; either a filename or a file-like stream.

  • filename – The absolute path of the source file, if the source was a file or a filename.

UNKNOWN#

alias of UnknownElement

addElement(eid, ename, baseClass, attribs=None, parent=None, docs=None)#

Create a new Element subclass and add it to the schema.

Duplicate elements are permitted (e.g. if one kind of element can appear in different master elements), provided their attributes do not conflict. The first appearance of an element definition in the schema must contain the required ID, name, and type; successive appearances only need the ID and/or name.

Parameters:
  • eid (int) – The element’s EBML ID.

  • ename (str) – The element’s name.

  • baseClass – The base Element class.

  • attribs (Optional[Dict[str, Any]]) – A dictionary of raw element attributes, as read from the schema file.

  • parent – The new element’s parent element class.

  • docs (Optional[str]) – The new element’s docstring (e.g. the defining XML element’s text content).

encode(stream, data, headers=False)#

Write an EBML document using this Schema to a file or file-like stream.

Parameters:
  • stream (BinaryIO) – The file (or .write()-supporting file-like object) to which to write the encoded EBML.

  • data (Union[Dict[str, Any], List[Tuple[str, Any]]]) – The data to encode, provided as a dictionary keyed by element name, or a list of two-item name/value tuples. Note: individual items in a list of name/value pairs must be tuples!

  • headers (bool) – If True, include the standard EBML header element.

encodes(data, headers=False)#

Create an EBML document using this Schema, returned as a string.

Parameters:
  • data (Union[Dict[str, Any], List[Tuple[str, Any]]]) – The data to encode, provided as a dictionary keyed by element name, or a list of two-item name/value tuples. Note: individual items in a list of name/value pairs must be tuples!

  • headers (bool) – If True, include the standard EBML header element.

Return type:

bytes

Returns:

A string containing the encoded EBML binary.

load(fp, name=None, headers=False, **kwargs)#

Load an EBML file using this Schema.

Parameters:
  • fp (BinaryIO) – A file-like object containing the EBML to load, or the name of an EBML file.

  • name (Optional[str]) – The name of the document. Defaults to filename.

  • headers (bool) – If False, the file’s EBML header element (if present) will not appear as a root element in the document. The contents of the EBML element will always be read.

Return type:

Document

loads(data, name=None)#

Load EBML from a string using this Schema.

Parameters:
  • data (bytes) – A string or bytearray containing raw EBML data.

  • name (Optional[str]) – The name of the document. Defaults to the Schema’s document class name.

Return type:

Document

property type: str#

Schema type name, extracted from EBML DocType default.

Return type:

str

verify(data)#

Perform basic tests on EBML binary data, ensuring it can be parsed using this Schema. Failure will raise an expression.

Return type:

bool

property version: int#

Schema version, extracted from EBML DocTypeVersion default.

Return type:

int

class ebmlite.core.StringElement(stream=None, offset=0, size=0, payloadOffset=0)#

Base class for an EBML ASCII string element. Schema-specific subclasses are generated when a Schema is loaded.

dtype#

alias of str

classmethod encodePayload(data, length=None)#

Type-specific payload encoder for ASCII string elements.

Return type:

bytes

parse(stream, size)#

Type-specific helper function for parsing the element’s payload. It is assumed the file pointer is at the start of the payload.

Return type:

str

class ebmlite.core.UIntegerElement(stream=None, offset=0, size=0, payloadOffset=0)#

Base class for an EBML unsigned integer element. Schema-specific subclasses are generated when a Schema is loaded.

dtype#

alias of int

classmethod encodePayload(data, length=None)#

Type-specific payload encoder for unsigned integer elements.

Return type:

bytes

parse(stream, size)#

Type-specific helper function for parsing the element’s payload. It is assumed the file pointer is at the start of the payload.

Return type:

int

class ebmlite.core.UnicodeElement(stream=None, offset=0, size=0, payloadOffset=0)#

Base class for an EBML UTF-8 string element. Schema-specific subclasses are generated when a Schema is loaded.

dtype#

alias of str

classmethod encodePayload(data, length=None)#

Type-specific payload encoder for Unicode string elements.

Return type:

bytes

parse(stream, size)#

Type-specific helper function for parsing the element’s payload. It is assumed the file pointer is at the start of the payload.

Return type:

str

class ebmlite.core.UnknownElement(stream=None, offset=0, size=0, payloadOffset=0, eid=None, schema=None)#

Special case Unknown element, used for elements with IDs not present in a schema. Unlike other elements, each instance has its own ID.

class ebmlite.core.VoidElement(stream=None, offset=0, size=0, payloadOffset=0)#

Special case Void element. Its contents are ignored and not read; its value is always returned as 0xFF times its length. To get the actual contents, use getRawValue().

classmethod encodePayload(data, length=0)#

Type-specific payload encoder for Void elements.

Return type:

bytearray

parse(stream, size)#

Type-specific helper function for parsing the element’s payload. It is assumed the file pointer is at the start of the payload.

Return type:

bytearray

ebmlite.core.loadSchema(filename, reload=False, paths=None, **kwargs)#

Import a Schema XML file. Loading the same file more than once will return the initial instantiation, unless reload is True.

Parameters:
  • filename (Union[str, Path]) – The name of the Schema XML file. If the file cannot be found and file’s path is not absolute, the paths listed in SCHEMA_PATH will be searched (similar to sys.path when importing modules).

  • reload (bool) – If True, the resulting Schema is guaranteed to be new. Note: existing references to previous instances of the Schema and/or its elements will not update.

  • paths (Optional[str]) – A list of paths to search for schemata, an alternative to ebmlite.SCHEMA_PATH

Additional keyword arguments are sent verbatim to the Schema constructor.

Raises:

IOError, ModuleNotFoundError

Return type:

Schema

ebmlite.core.parseSchema(src, name=None, reload=False, **kwargs)#

Read Schema XML data from a string or stream. Loading one with the same name will return the initial instantiation, unless reload is True. Calls to loadSchema() using a name previously used with parseSchema() will also return the previously instantiated Schema.

Parameters:
  • src (str) – The XML string, or a stream containing XML.

  • name (Optional[str]) – The name of the schema. If none is supplied, the name defined within the schema will be used.

  • reload (bool) – If True, the resulting Schema is guaranteed to be new. Note: existing references to previous instances of the Schema and/or its elements will not update.

Additional keyword arguments are sent verbatim to the Schema constructor.

Return type:

Schema