Advanced deserialization with Serde: Parsing Cloudformation templates

serde is a great library for Rust that allows you to serialize and deserialize your structs efficiently. In this article, we will review some advanced features of this library by parsing AWS Cloudformation templates.

Cloudformation is an infrastructure as a code. It allows you to model your AWS resources inside the template. After that, AWS will provision all the resources in the correct order for you. It’s a powerful tool that is very popular nowadays.

General structure

Nowadays, Cloudformation templates consist of the following sections:

Resources - The main section of the template. This section is required and defines the stack of resources that needs to be created.
Conditions - Defines conditions that control the creation of resources and their properties.
Mappings - This contains a dictionary of keys and associated values. You can refer to the mappings in your resources.
Parameters - Defines inputs for your stack. Here you can specify values that you can enter at the stack creation.
Outputs - Describes the values that the Cloudformation stack should return. This section is used to return properties of created resources.

There are other template sections, but they are omitted because they are not used very often.

Template overview

Have a look at this basic JSON template. It only consists of the Resources section and defines 2 resources that AWS will create and their state. As you can see, it defines SecurityGroup and EC2 Instance that refers to it. Spend some time investigating the relationship between resources, properties, and values in the template. We will write a Rust program to parse such templates.

{
    "Resources": {
        "Ec2Instance": {
            "Type": "AWS::EC2::Instance",
            "Properties": {
                "SecurityGroups": [
                    {
                        "Ref": "InstanceSecurityGroup"
                    },
                    "MyExistingSecurityGroup"
                ],
                "KeyName": "mykey",
                "ImageId": "ami-7a11e213"
            }
        },
        "InstanceSecurityGroup": {
            "Type": "AWS::EC2::SecurityGroup",
            "Properties": {
                "GroupDescription": "Enable SSH access via port 22",
                "SecurityGroupIngress": [
                    {
                        "IpProtocol": "tcp",
                        "FromPort": 22,
                        "ToPort": 22,
                        "CidrIp": "0.0.0.0/0"
                    }
                ]
            }
        }
    }
}

Applicaiton implementation

Basic setup

As always, we start with the cargo command to create an application directory:

cargo new cfn-validator

It will create the basic project setup for you. Open Cargo.toml inside the cfn-folder and add these libraries to the dependency list. We will apply serde attributes to our structs, that’s why we need derive feature from the serde library. It will allow us to parse the templates without writing too much code.

Actually, we will define only the structs, fields, and names. Deserialization logic will be added automatically by the lib.

[dependencies]
serde = { version = "1.0", features = ["derive"] }
serde_yaml = "0.8"
serde_json = "1.0"
anyhow = "1.0"

Template struct

This guide will go from top to bottom in structure definition. We will start from the template’s root.

As mentioned before, the template consists of several sections. Each section has its anatomy, and we will define them later. We will mark optional sections with Option enum.

#[derive(Deserialize, Debug)]
#[serde(rename_all = "PascalCase")]
pub struct Template {
    #[serde(rename = "AWSTemplateFormatVersion")]
    aws_template_format_version: Option<String>,
    metadata: Option<HashMap<String, String>>,
    description: Option<String>,
    mappings: Option<Mapping>,
    parameters: Option<HashMap<String, Parameter>>,
    resources: HashMap<String, Resource>,
    outputs: Option<HashMap<String, Output>>,
}

Have a look at the attributes applied to this struct. Deserialize attribute is imported from serde lib, and it defines the generic implementation to deserialize this struct. Debug is used for debugging during the testing.

#[serde(rename_all = "PascalCase")] is more interesting attribute. You may have noticed that Cloudformation defines names in PascalCase. Still, we use snake_case in our Rust application according to the guidelines. serde allows us to use the alias for our fields instead for convenience.

But it’s not enough for this struct. aws_template_format_version will be mapped to AwsTemplateFormatVersion, and its incorrect field name. We can use special #[serde(rename = "AWSTemplateFormatVersion")] attribute to rename this field correctly. You will see more such attributes later in this guide.

Value struct

Cloudformation allows us to use different types of values:

Strings
Numbers
Arrays
Intrinsic functions

We need to define a separate enum to understand all the available types. Let’s call it Value:

#[derive(Deserialize, Debug, PartialEq, Eq)]
#[serde(untagged)]
pub enum Value {
    String(String),
    Number(i64),
    #[serde(rename_all = "PascalCase")]
    Ref {
        r#ref: String,
    },
    GetAtt {
        #[serde(rename = "Fn::GetAtt")]
        get_att: Vec<String>,
    },
    Join {
        #[serde(rename = "Fn::Join")]
        join: (String, Vec<Value>),
    },
    Sub {
        #[serde(rename = "Fn::Sub")]
        sub: String,
    },
}

serde library is smart enough to correctly replace the values with your enums. It will replace a number with Value::Number, a string with Value::String, intrinsic function with appropriate representation. To activate that functionality we need to use #[serde(untagged)] attribute.

We should define some tests to understand better how it works. These examples use serde_yaml to parse the yaml string to our struct:

#[test]
fn test_deserialize_value_string() {
    let yaml = "some value";
    let expected = Value::String(yaml.to_string());

    let actual: Value = serde_yaml::from_str(yaml).unwrap();
    assert_eq!(expected, actual);
}

#[test]
fn test_deserialize_value_number() {
    let test_cases = [("-5", -5), ("0", 0), ("23123", 23123)];
    for test_case in test_cases {
        let yaml = test_case.0;
        let expected = Value::Number(test_case.1);

        let actual = serde_yaml::from_str(yaml).unwrap();
        assert_eq!(expected, actual);
    }
}

#[test]
fn test_deserialize_value_ref() {
    let yaml = "Ref: 'SSHLocation'";
    let expected = Value::Ref {
        r#ref: "SSHLocation".to_string(),
    };

    let actual = serde_yaml::from_str(yaml).unwrap();
    assert_eq!(expected, actual);
}

#[test]
fn test_deserialize_value_get_att() {
    let yaml = "Fn::GetAtt: [ElasticLoadBalancer, SourceSecurityGroup.OwnerAlias]";
    let expected = Value::GetAtt {
        get_att: vec![
            "ElasticLoadBalancer".to_string(),
            "SourceSecurityGroup.OwnerAlias".to_string(),
        ],
    };

    let actual = serde_yaml::from_str(yaml).unwrap();
    assert_eq!(expected, actual);
}

#[test]
fn test_deserialize_value_join() {
    let yaml = "Fn::Join: ['', ['http://', Fn::GetAtt: [ElasticLoadBalancer, DNSName]]]";
    let expected = Value::Join {
        join: (
            "".to_string(),
            vec![
                Value::String("http://".to_string()),
                Value::GetAtt {
                    get_att: vec!["ElasticLoadBalancer".to_string(), "DNSName".to_string()],
                },
            ],
        ),
    };

    let actual = serde_yaml::from_str(yaml).unwrap();
    assert_eq!(expected, actual);
}

#[test]
fn test_deserialize_value_sub() {
    let yaml = "Fn::Sub: '${AWS::StackName}-VPCID'";
    let expected = Value::Sub {
        sub: "${AWS::StackName}-VPCID".to_string(),
    };

    let actual = serde_yaml::from_str(yaml).unwrap();
    assert_eq!(expected, actual);
}

All tests work correctly without a single line of deserialization implementation! That’s the power of the serde library.

Mappings section

Cloudformation allows us to create a dictionary of values and use it inside the template. Each mapping consist of a key and a value, where the value can be any of the following:

String
List of strings
Another mapping

serde allow us to deserialize recursive structs as well as any other structs. Implementation is simple enough:

#[derive(Deserialize, Debug, PartialEq, Eq)]
pub struct Mapping {
    #[serde(flatten)]
    entries: HashMap<String, MappingEntry>,
}

#[derive(Deserialize, Debug, PartialEq, Eq)]
#[serde(untagged)]
pub enum MappingEntry {
    String(String),
    List(Vec<String>),
    Mapping(Mapping),
}

Here we added new attribute #[serde(flatten)]. That means we omit the entries key. HashMap<String, MappingEntry> entries are the root values of mapping.

Here are some tests:

#[test]
fn test_deserialize_mappings() {
    let yaml = r#"
Name: Test
NameList:
- First
- Second
NameMap:
first: First
second:
- A
- B
- C
third:
A: B
B: C
C: A
    "#;

    let expected = Mapping {
        entries: HashMap::from([
            ("Name".to_string(), MappingEntry::String("Test".to_string())),
            (
                "NameList".to_string(),
                MappingEntry::List(vec!["First".to_string(), "Second".to_string()]),
            ),
            (
                "NameMap".to_string(),
                MappingEntry::Mapping(Mapping {
                    entries: HashMap::from([
                        (
                            "first".to_string(),
                            MappingEntry::String("First".to_string()),
                        ),
                        (
                            "second".to_string(),
                            MappingEntry::List(vec![
                                "A".to_string(),
                                "B".to_string(),
                                "C".to_string(),
                            ]),
                        ),
                        (
                            "third".to_string(),
                            MappingEntry::Mapping(Mapping {
                                entries: HashMap::from([
                                    ("A".to_string(), MappingEntry::String("B".to_string())),
                                    ("B".to_string(), MappingEntry::String("C".to_string())),
                                    ("C".to_string(), MappingEntry::String("A".to_string())),
                                ]),
                            }),
                        ),
                    ]),
                }),
            ),
        ]),
    };
    let mapping: Mapping = serde_yaml::from_str(yaml).unwrap();

    assert_eq!(expected, mapping);
}

Resources section

The resources section is the most required and the most complex. It contains definitions for many AWS services. You should refer to the official documentation for all the types and properties. In this guide, we will define a few structs and demonstrate how to extend them in the future.

It’s good to create a separate folder resources in your project with the file mod.rs. Inside that file, we will define the top-level Resources section.

But before that, have a look at the sample resource definition once again:

"Ec2Instance": {
    "Type": "AWS::EC2::Instance",
    "Properties": {
        ...
    }
},
"InstanceSecurityGroup": {
    "Type": "AWS::EC2::SecurityGroup",
    "Properties": {
        ...
    }
}

It consists of the "key", "Type", and "Properties". Type defines the resource type, and depending on that Properties will be different.

Now implement it in Rust.

#[derive(Deserialize, Debug, PartialEq, Eq)]
#[serde(rename_all = "PascalCase")]
#[serde(tag = "Type")]
pub enum Resource {
    #[serde(rename = "AWS::EC2::Instance")]
    Ec2(ResourceContainer<Ec2>),
    #[serde(rename = "AWS::EC2::VPC")]
    Vpc(ResourceContainer<Vpc>),
    #[serde(rename = "AWS::SNS::Topic")]
    Topic,
    #[serde(rename = "AWS::AutoScaling::AutoScalingGroup")]
    AutoScalingGroup,
    #[serde(rename = "AWS::AutoScaling::LaunchConfiguration")]
    LaunchConfiguration,
    #[serde(rename = "AWS::AutoScaling::ScalingPolicy")]
    ScalingPolicy,
    #[serde(rename = "AWS::CloudWatch::Alarm")]
    Alarm,
    #[serde(rename = "AWS::ElasticLoadBalancing::LoadBalancer")]
    LoadBalancer,
    #[serde(rename = "AWS::EC2::SecurityGroup")]
    SecurityGroup(ResourceContainer<SecurityGroup>),
}

#[derive(Deserialize, Debug, PartialEq, Eq)]
#[serde(rename_all = "PascalCase")]
pub struct ResourceContainer<T> {
    properties: T,
}

In this part we added new attribute #[serde(tag = "Type")]. Depending on the Type field, it will use the appropriate enum value. We also added the ResourceContainer generic struct. It corresponds to the Properties field of each resource.

Here is the test that uses serde_json to deserialize JSON into our struct:

#[test]
fn test_deserialize_resource() {
    let json = r#"
{
    "Type": "AWS::EC2::Instance",
    "Properties": {
        "KeyName": "myKey"
    }
}
    "#;
    let expected = Resource::Ec2(ResourceContainer {
        properties: Ec2 {
            key_name: Some(Value::String("myKey".to_string())),
            security_groups: None,
            image_id: None,
        },
    });

    let actual = serde_json::from_str(json).unwrap();
    assert_eq!(expected, actual);
}

Note how efficient our implementation is. It works with both JSON and YAML without any implementation in the code!

VPC Struct

To define VPC we should use this template:

{
  "Type" : "AWS::EC2::VPC",
  "Properties" : {
      "CidrBlock" : String,
      "EnableDnsHostnames" : Boolean,
      "EnableDnsSupport" : Boolean,
      "InstanceTenancy" : String,
      "Ipv4IpamPoolId" : String,
      "Ipv4NetmaskLength" : Integer,
      "Tags" : [ Tag, ... ]
    }
}

Let’s translate it to the Rust:

#[derive(Deserialize, Debug, PartialEq, Eq)]
#[serde(rename_all = "PascalCase")]
pub struct Vpc {
    cidr_block: String,
    enable_dns_hostnames: Option<Value>,
    enable_dns_support: Option<Value>,
    instance_tenancy: Option<InstanceTenancy>,
    ipv4_ipam_pool_id: Option<String>,
    ipv4_netmask_length: Option<Value>,
    tags: Option<Vec<Tag>>,
}

#[derive(Deserialize, Debug, PartialEq, Eq)]
#[serde(rename_all = "lowercase")]
enum InstanceTenancy {
    Default,
    Dedicated,
    Host,
}

Keep your attention to the InstanceTenancy struct. It allows only some set of values according to the documentation. We can use separate enum in our code to represent it correctly. Also, note that almost all fields are optional except CidrBlock and serde will raise errors if it’s missing.

Test implementation is the following:

#[test]
fn test_deserialize_vpc() {
    let json = r#"
{
"CidrBlock" : "10.0.0.0/16",
"EnableDnsSupport" : "true",
"EnableDnsHostnames" : "true",
"InstanceTenancy": "dedicated",
"Ipv4NetmaskLength": "28",
"Tags" : [ 
    {"Key" : "stack", "Value" : "production"} 
]
}"#;
    let expected = Vpc {
        cidr_block: "10.0.0.0/16".to_string(),
        enable_dns_support: Some(Value::String("true".to_string())),
        enable_dns_hostnames: Some(Value::String("true".to_string())),
        tags: Some(vec![Tag {
            key: "stack".to_string(),
            value: "production".to_string(),
        }]),
        instance_tenancy: Some(InstanceTenancy::Dedicated),
        ipv4_ipam_pool_id: None,
        ipv4_netmask_length: Some(Value::String("28".to_string())),
    };

    let actual = serde_json::from_str(json).unwrap();
    assert_eq!(expected, actual);
}

CLI Testing

Let’s write a small CLI application to read the JSON template, parse it and output the struct to the console.

use anyhow::Result;
use cfn_validator::{self, Template};
use std::fs;

fn main() -> Result<()> {
    let code = fs::read_to_string("./template.json")?;
    let result: Template = serde_json::from_str(&code)?;
    println!("{:?}", result);

    Ok(())
}

It’s easy enough, right? We need to create a sample template.json:

{
    "Resources": {
        "myVPC": {
            "Type": "AWS::EC2::VPC",
            "Properties": {
                "CidrBlock": "10.0.0.0/16",
                "EnableDnsSupport": "true",
                "EnableDnsHostnames": "true",
                "Tags": [
                    {
                        "Key": "stack",
                        "Value": "production"
                    }
                ]
            }
        }
    }
}

The result of the execution would be the following:

Template { aws_template_format_version: None, metadata: None, description: None, mappings: None, parameters: None, resources: {"myVPC": Vpc(ResourceContainer { properties: Vpc { cidr_block: "10.0.0.0/16", enable_dns_hostnames: Some(String("true")), enable_dns_support: Some(String("true")), instance_tenancy: None, ipv4_ipam_pool_id: None, ipv4_netmask_length: None, tags: Some([Tag { key: "stack", value: "production" }]) } })}, outputs: None }

Further steps

Cloudformation resource library is vast. You can keep adding more resources if you wish.

Apart from that, you should add a ruleset to verify the correctness of your template:

Check Refs, GetAtt and FindInMap
Check the validity of properties (Some properties should accept only numbers, others may have some patterns)

Moreover, the current implementation doesn’t support a special format of intrinsic functions inside the YAML templates (!Ref, !GetAtt, etc.). You can add such support to enhance the library.

Summary

In this guide, we learned to use advanced serde features. With simple attributes, you can build an advanced application that can parse complex JSON, Yaml, Toml, and other formats without any deserialization logic.

We went through the Cloudformation templates as an example of complex structure.

You can view the final code in my public repo.

General structure#

Template overview#

Applicaiton implementation#

Basic setup#

Template struct#

Value struct#

Mappings section#

Resources section#

VPC Struct#

CLI Testing#

Further steps#

Summary#