How to parse a YAML file in Python

Posted in Python by Dirk - last update: Feb 10, 2024

To parse a YAML 1.1 file in Python, use can use the PyYAML library and its safe_load function (for security) or load function (more general use). For YAML 1.2 you can use the ruamel.yaml parser,

What is YAML

YAML (YAML Ain’t Markup Language or, sometimes, Yet Another Markup Language) is a human-readable data serialization format. It’s often used for configuration files, data exchange between languages with different data structures, and as a configuration file format for applications.

Here’s an example of data formatted in YAML:

# Sample YAML data
name: John Doe
age: 30
city: New York
skills:
  - Python
  - JavaScript
  - YAML

In this example:

  • The data is represented using key-value pairs.
  • Nested structures are indicated by indentation.
  • Lists are represented using a hyphen followed by a space (-).
  • Comments start with the # symbol

Common use cases for YAML

YAML is used for:

  • Configuration Files: Many applications use YAML for configuration files due to its readability and ease of use. For example, settings for a web server, a database, or a build tool might be defined in a YAML configuration file.
  • Data Serialization: YAML is often used to serialize and deserialize data between different programming languages. It’s more human-readable than JSON and can handle complex data structures.
  • Automation and Orchestration Tools: YAML is commonly used in tools like Ansible, Kubernetes, and Docker Compose for defining infrastructure as code or describing deployment configurations.

Alternative formates for YAML

Alternative data serialization formats :

  • JSON (JavaScript Object Notation): Another widely used format that shares some similarities with YAML. JSON is more strict in its syntax and may be preferred in certain scenarios.
  • XML (eXtensible Markup Language): An older format that is more verbose and less human-readable compared to YAML and JSON. It’s commonly used in legacy systems.
  • TOML (Tom’s Obvious Minimal Language): A configuration file format that aims to be more readable and writable than YAML. It’s gaining popularity, especially in the Rust programming language community.

How to Parse a YAML 1.1 file in Python

You can use the PyYAML library to parse YAML files. It is not part of the Python core, so you need to install it

pip install PyYAML

Once you have PyYAML installed, you can use the following example code to parse a YAML file:

import yaml

def parse_yaml_file(file_path):
    try:
        with open(file_path, 'r') as yaml_file:
            data = yaml.safe_load(yaml_file)
            return data
    except FileNotFoundError:
        print(f"Error: File '{file_path}' not found.")
        return None
    except yaml.YAMLError as exc:
        print(f"Error parsing YAML in file '{file_path}': {exc}")
        return None
    except Exception as e:
        print(f"An unexpected error occurred: {e}")
        return None

# Example usage:
file_path = 'your_file.yaml'

parsed_data = parse_yaml_file(file_path)

if parsed_data is not None:
    # Now 'parsed_data' contains the data from the YAML file as a Python dictionary
    print(parsed_data)
else:
    print("Failed to parse YAML file.")

How does it work?

  • try block: The parsing code is placed within a try block. This is where the normal execution occurs.

  • except FileNotFoundError block: Catches the specific exception raised if the file is not found and prints an error message.

  • except yaml.YAMLError block: Catches exceptions related to YAML parsing errors and prints an error message with details about the specific YAML error.

  • except Exception block: Catches any unexpected exceptions that might occur during the parsing process and prints a general error message.

  • return None: In case of an error, the function returns None. You can modify this behavior based on your specific requirements.

  • Example Usage: Calls the parse_yaml_file function with the file path. If parsing is successful, it prints the parsed data; otherwise, it prints an error message.

Note: In PyYAML, safe_load is used instead of load for security reasons. The safe_load function provides a safer way to load YAML documents by restricting the types of objects that can be constructed during the parsing process.

Using load can be risky if you’re loading YAML from untrusted sources, as it may inadvertently execute arbitrary code during the parsing process, leading to security vulnerabilities. The safe_load function aims to mitigate this risk by only allowing the creation of basic Python objects like dictionaries, lists, strings, numbers, etc.

A short example to explain the difference between the two load functions:

import yaml

# Example YAML with a Python object constructor (dangerous)
yaml_data = "!!python/object/apply:os.system ['echo Hello, World!']"

# Using load (unsafe)
loaded_data_unsafe = yaml.load(yaml_data)
print(loaded_data_unsafe)  # May execute arbitrary code!

# Using safe_load (safer)
loaded_data_safe = yaml.safe_load(yaml_data)
print(loaded_data_safe)  # None, because the object creation is restricted

In this example, using load could potentially execute the os.system command, which is a security risk. On the other hand, safe_load returns None because the construction of objects is restricted.

If you are loading YAML from untrusted sources, it is strongly recommended to use safe_load to minimize the risk of code execution vulnerabilities. However, if you are confident about the source and content of the YAML and understand the potential risks, you might use load. Always be cautious and consider security implications when working with data from external or untrusted sources.

How to Parse a YAML 1.2 file in Python

PyYAML primarily supports YAML 1.1, and full support for YAML 1.2 is not guaranteed. YAML 1.2 introduced several changes and clarifications to the specification, and not all of these changes might be fully supported by PyYAML.

ruamel.yaml is a YAML parser/emitter that is often considered a superset of PyYAML and is actively maintained. It supports YAML 1.2 and includes additional features. It is designed to be compatible with PyYAML but with some improvements and additions.

To install ruamel.yaml:

pip install ruamel.yaml

Usage is similar to PyYAML:

import ruamel.yaml

with open("your_file.yaml", "r") as yaml_file:
    data = ruamel.yaml.round_trip_load(yaml_file)
    print(data)

Other articles