Contents

New Resource for Terraform Provider AWS

本文是介绍如何给terraform provider aws新增一个resource,管理Aurora Database Activity Stream,对应的PR

Terraform Architecture

Terraform工作逻辑主要分为两部分:Terraform Core和Terraform Plugins。

https://raw.githubusercontent.com/Fedomn/misc-blog-assets/master/terraform-Internals-Provider.png

Terraform Cores使用RPC与Terraform Plugins通信,HashiCorp通过go-plugin实现Golang基于RPC的插件系统。

HashiCorp已经为我们提供了terraform-plugin-sdk可以方便的实现一个Terraform Plugin。

归纳Core与Provider的主要职责:

Terraform Core:

  • gRPC client
  • user facing
  • manage the whole resources graph
  • manage state and ask providers to mutate resource state(管理资源状态 和 控制provider同步资源状态变更)
  • manage providers operations (Create, Read, Update, Delete) order

Terraform Provider:

  • gPRC server
  • execute domain-specific logic
    • Create, Read, Update, Delete, Import, Validate a Resource
    • Read a Data Source
  • provide resource updated state to Terraform Core
  • validate domain-specific inputs and handle errors

Terraform Schemas & Resources

新建resource

直接从这次PR代码开始介绍概念。

打开terraform-provider-aws的代码,main.go中直接import了aws.Provider,进而看到aws package下有很多data_source_aws_*.goresource_source_aws_*.go这种pattern的文件。

这里命名的约定对应了一个Provider两种主要的components:

  • Resources: represent a single AWS API object
  • Data Source: fetch a AWS object data

我们需要开启的Aurora Activity Stream,本质和aws cli start-activity-stream做的事情是一样的,它是一个单独的API object,所有我们新增了Resource: resource_aws_rds_cluster_activity_stream.go

aws provider的入口aws.Provider这个函数,如下

1
2
3
4
5
6
7
8
9
provider := &schema.Provider{
  DataSourcesMap: map[string]*schema.Resource{
    "aws_acm_certificate": dataSourceAwsAcmCertificate(),
  },
  ResourcesMap: map[string]*schema.Resource{
    "aws_acm_certificate":             resourceAwsAcmCertificate(),
    "aws_rds_cluster_activity_stream": resourceAwsRDSClusterActivityStream(),
  },  
}

我们要做的就是在ResourcesMap里新增"aws_rds_cluster_activity_stream"和对应的*schema.Resource。

定义Schema

Terraform Schemas用来定义DataSource和Resource所需的attributes 和 behaviors,如下schema.Resource定义

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
type Resource struct {
  // Schema is the schema for the configuration of this resource.
  //
  // The keys of this map are the configuration keys, and the values
  // describe the schema of the configuration value.
  //
  // The schema is used to represent both configurable data as well
  // as data that might be computed in the process of creating this
  // resource.
  Schema map[string]*Schema


  // The functions below are the CRUD operations for this resource.
  Create CreateFunc
  Read   ReadFunc
  Update UpdateFunc
  Delete DeleteFunc
}

aws_rds_cluster_activity_stream关于schema与CRUD函数定义如

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
&schema.Resource{
  Create: resourceAwsRDSClusterActivityStreamCreate,
  Read:   resourceAwsRDSClusterActivityStreamRead,
  Delete: resourceAwsRDSClusterActivityStreamDelete,
  Importer: &schema.ResourceImporter{
    State: schema.ImportStatePassthrough,
  },

  Timeouts: &schema.ResourceTimeout{
    Create: schema.DefaultTimeout(120 * time.Minute),
    Delete: schema.DefaultTimeout(120 * time.Minute),
  },

  Schema: map[string]*schema.Schema{
    "resource_arn": {
      Type:         schema.TypeString,
      Required:     true,
      ForceNew:     true,
      ValidateFunc: validateArn,
    },
    "kms_key_id": {
      Type:     schema.TypeString,
      Required: true,
      ForceNew: true,
    },
    "mode": {
      Type:     schema.TypeString,
      Required: true,
      ForceNew: true,
      ValidateFunc: validation.StringInSlice([]string{
        rds.ActivityStreamModeSync,
        rds.ActivityStreamModeAsync,
      }, false),
    },
    "kinesis_stream_name": {
      Type:     schema.TypeString,
      Computed: true,
    },
  },
}

秉承Provider Design Principles,schema.Schema中key命名与aws sdk api保持一致。

Resource所需的inputs(resource_arn, kms_key_id, mode)都是Required,这是aws sdk底层StartActivityStreamInput所决定的。

同时这些inputs也是ForceNew,原因也和aws sdk密不可分。sdk中rds activity stream只有start和stop这两种request,从aws console上也证明了activity stream目前没有update这种操作。所以,针对这种资源的更新操作,等于先delete再create。

kinesis_stream_name作为outputs也需要定义在schema.Schema中,最终会持久化到state文件里。它特殊在Computed参数,用来告诉Terraform这个属性会在创建时候计算得到。

schema.Schema中定义了一些function,可以用来控制处理attributes,如上面的ValidateFunc

定义CRUD操作

我们省去的update function,因为所有inputs都是ForceNew的。

先看一个create函数

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
func resourceAwsRDSClusterActivityStreamCreate(d *schema.ResourceData, meta interface{}) error {
  conn := meta.(*AWSClient).rdsconn

  resourceArn := d.Get("resource_arn").(string)
  kmsKeyId := d.Get("kms_key_id").(string)
  mode := d.Get("mode").(string)

  startActivityStreamInput := &rds.StartActivityStreamInput{
    ResourceArn:      aws.String(resourceArn),
    ApplyImmediately: aws.Bool(true),
    KmsKeyId:         aws.String(kmsKeyId),
    Mode:             aws.String(mode),
  }

  log.Printf("[DEBUG] RDS Cluster start activity stream input: %s", startActivityStreamInput)

  resp, err := conn.StartActivityStream(startActivityStreamInput)
  if err != nil {
    return fmt.Errorf("error creating RDS Cluster Activity Stream: %s", err)
  }

  log.Printf("[DEBUG]: RDS Cluster start activity stream response: %s", resp)

  d.SetId(resourceArn)

  err = waiter.ActivityStreamStarted(conn, d.Id(), d.Timeout(schema.TimeoutCreate))
  if err != nil {
    return err
  }

  return resourceAwsRDSClusterActivityStreamRead(d, meta)
}

函数内容就是call aws sdk api,然后就不断retry等待资源创建完成,再通过read函数将aws上资源状态读出设置到schema.ResourceData,最终持久化到state中。

我们知道IaC的一个核心挑战:code = cloud resource。Terraform将cloud resource状态放到state文件中,state文件作为cloud上资源的single source of truth。进而Terraform通过state和code对比,发现drift进行下一步操作。

如何避免发生drift呢?参考

  • read函数从cloud读出的所有attributes都存放在state中,也就是state里attributes是read下来的
  • create和update完后,都调用read方法,来保证state的真实性。

试想一下,如果没有read同步state与cloud资源,那create和update什么attributes,state里就是什么attributes,如果出现create的attribute和cloud上对应的attribute不一致,但是仍在create成功了。这就发生了IaC的核心挑战问题:code与cloud resource不一致。

有了drift的概念,再看一个read函数

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
func resourceAwsRDSClusterActivityStreamRead(d *schema.ResourceData, meta interface{}) error {
  conn := meta.(*AWSClient).rdsconn

  input := &rds.DescribeDBClustersInput{
    DBClusterIdentifier: aws.String(d.Id()),
  }

  log.Printf("[DEBUG] Describing RDS Cluster: %s", input)
  resp, err := conn.DescribeDBClusters(input)

  if isAWSErr(err, rds.ErrCodeDBClusterNotFoundFault, "") {
    log.Printf("[WARN] RDS Cluster (%s) not found, removing from state", d.Id())
    d.SetId("")
    return nil
  }

  if err != nil {
    return fmt.Errorf("error describing RDS Cluster (%s): %s", d.Id(), err)
  }

  if resp == nil {
    return fmt.Errorf("error retrieving RDS cluster: empty response for: %s", input)
  }

  var dbc *rds.DBCluster
  for _, c := range resp.DBClusters {
    if aws.StringValue(c.DBClusterArn) == d.Id() {
      dbc = c
      break
    }
  }

  if dbc == nil {
    log.Printf("[WARN] RDS Cluster (%s) not found, removing from state", d.Id())
    d.SetId("")
    return nil
  }

  if aws.StringValue(dbc.ActivityStreamStatus) == rds.ActivityStreamStatusStopped {
    log.Printf("[WARN] RDS Cluster (%s) Activity Stream already stopped, removing from state", d.Id())
    d.SetId("")
    return nil
  }

  d.Set("resource_arn", dbc.DBClusterArn)
  d.Set("kms_key_id", dbc.ActivityStreamKmsKeyId)
  d.Set("kinesis_stream_name", dbc.ActivityStreamKinesisStreamName)
  d.Set("mode", dbc.ActivityStreamMode)

  return nil
}

容错处理,当资源disappear时,需要标记资源被删除,即d.SetId("")。

同步state,将Response DBCluster中关于activity stream的所有attributes都保存起来。

Acceptance Tests

通过AcceptanceTest用真实的云上资源进行测试,需要满足Acceptance Test Checklists,来减少大部分bug。

因为是操作真实的aws资源,需要准备一下account credentials。通过TestSweepers保证测试后cleanup所需资源。

其中需要注意的disappear test,用来handle资源没有通过Terraform删掉,成为了dangling资源的情况。

Reference

Terraform and The Extensible Provider Architecture

Creating a Terraform Provider for Just About Anything

5 Lessons Learned From Writing Over 300,000 Lines of Infrastructure Code