New Resource for Terraform Provider AWS

frankma

2020-06-06 2877 words 6 minutes

Contents

本文是介绍如何给terraform provider aws新增一个resource，管理Aurora Database Activity Stream，对应的PR。

Terraform Architecture

Terraform工作逻辑主要分为两部分：Terraform Core和Terraform Plugins。

Terraform Cores使用RPC与Terraform Plugins通信，HashiCorp通过go-plugin实现Golang基于RPC的插件系统。

HashiCorp已经为我们提供了terraform-plugin-sdk可以方便的实现一个Terraform Plugin。

归纳Core与Provider的主要职责：

Terraform Core：

gRPC client
user facing
manage the whole resources graph
manage state and ask providers to mutate resource state(管理资源状态和控制provider同步资源状态变更)
manage providers operations (Create, Read, Update, Delete) order

Terraform Provider:

gPRC server
execute domain-specific logic
- Create, Read, Update, Delete, Import, Validate a Resource
- Read a Data Source
provide resource updated state to Terraform Core
validate domain-specific inputs and handle errors

Terraform Schemas & Resources

新建resource

直接从这次PR代码开始介绍概念。

打开terraform-provider-aws的代码，main.go中直接import了aws.Provider，进而看到aws package下有很多data_source_aws_*.go和resource_source_aws_*.go这种pattern的文件。

这里命名的约定对应了一个Provider两种主要的components：

Resources: represent a single AWS API object
Data Source: fetch a AWS object data

我们需要开启的Aurora Activity Stream，本质和aws cli start-activity-stream做的事情是一样的，它是一个单独的API object，所有我们新增了Resource: resource_aws_rds_cluster_activity_stream.go

aws provider的入口aws.Provider这个函数，如下

1
2
3
4
5
6
7
8
9
provider := &schema.Provider{
  DataSourcesMap: map[string]*schema.Resource{
    "aws_acm_certificate": dataSourceAwsAcmCertificate(),
  },
  ResourcesMap: map[string]*schema.Resource{
    "aws_acm_certificate":             resourceAwsAcmCertificate(),
    "aws_rds_cluster_activity_stream": resourceAwsRDSClusterActivityStream(),
  },  
}

我们要做的就是在ResourcesMap里新增"aws_rds_cluster_activity_stream"和对应的*schema.Resource。

定义Schema

Terraform Schemas用来定义DataSource和Resource所需的attributes 和 behaviors，如下schema.Resource定义

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
type Resource struct {
  // Schema is the schema for the configuration of this resource.
  //
  // The keys of this map are the configuration keys, and the values
  // describe the schema of the configuration value.
  //
  // The schema is used to represent both configurable data as well
  // as data that might be computed in the process of creating this
  // resource.
  Schema map[string]*Schema


  // The functions below are the CRUD operations for this resource.
  Create CreateFunc
  Read   ReadFunc
  Update UpdateFunc
  Delete DeleteFunc
}

aws_rds_cluster_activity_stream关于schema与CRUD函数定义如

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
&schema.Resource{
  Create: resourceAwsRDSClusterActivityStreamCreate,
  Read:   resourceAwsRDSClusterActivityStreamRead,
  Delete: resourceAwsRDSClusterActivityStreamDelete,
  Importer: &schema.ResourceImporter{
    State: schema.ImportStatePassthrough,
  },

  Timeouts: &schema.ResourceTimeout{
    Create: schema.DefaultTimeout(120 * time.Minute),
    Delete: schema.DefaultTimeout(120 * time.Minute),
  },

  Schema: map[string]*schema.Schema{
    "resource_arn": {
      Type:         schema.TypeString,
      Required:     true,
      ForceNew:     true,
      ValidateFunc: validateArn,
    },
    "kms_key_id": {
      Type:     schema.TypeString,
      Required: true,
      ForceNew: true,
    },
    "mode": {
      Type:     schema.TypeString,
      Required: true,
      ForceNew: true,
      ValidateFunc: validation.StringInSlice([]string{
        rds.ActivityStreamModeSync,
        rds.ActivityStreamModeAsync,
      }, false),
    },
    "kinesis_stream_name": {
      Type:     schema.TypeString,
      Computed: true,
    },
  },
}

秉承Provider Design Principles，schema.Schema中key命名与aws sdk api保持一致。

Resource所需的inputs(resource_arn, kms_key_id, mode)都是Required，这是aws sdk底层StartActivityStreamInput所决定的。

同时这些inputs也是ForceNew，原因也和aws sdk密不可分。sdk中rds activity stream只有start和stop这两种request，从aws console上也证明了activity stream目前没有update这种操作。所以，针对这种资源的更新操作，等于先delete再create。

kinesis_stream_name作为outputs也需要定义在schema.Schema中，最终会持久化到state文件里。它特殊在Computed参数，用来告诉Terraform这个属性会在创建时候计算得到。

schema.Schema中定义了一些function，可以用来控制处理attributes，如上面的ValidateFunc

定义CRUD操作

我们省去的update function，因为所有inputs都是ForceNew的。

先看一个create函数

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
func resourceAwsRDSClusterActivityStreamCreate(d *schema.ResourceData, meta interface{}) error {
  conn := meta.(*AWSClient).rdsconn

  resourceArn := d.Get("resource_arn").(string)
  kmsKeyId := d.Get("kms_key_id").(string)
  mode := d.Get("mode").(string)

  startActivityStreamInput := &rds.StartActivityStreamInput{
    ResourceArn:      aws.String(resourceArn),
    ApplyImmediately: aws.Bool(true),
    KmsKeyId:         aws.String(kmsKeyId),
    Mode:             aws.String(mode),
  }

  log.Printf("[DEBUG] RDS Cluster start activity stream input: %s", startActivityStreamInput)

  resp, err := conn.StartActivityStream(startActivityStreamInput)
  if err != nil {
    return fmt.Errorf("error creating RDS Cluster Activity Stream: %s", err)
  }

  log.Printf("[DEBUG]: RDS Cluster start activity stream response: %s", resp)

  d.SetId(resourceArn)

  err = waiter.ActivityStreamStarted(conn, d.Id(), d.Timeout(schema.TimeoutCreate))
  if err != nil {
    return err
  }

  return resourceAwsRDSClusterActivityStreamRead(d, meta)
}

函数内容就是call aws sdk api，然后就不断retry等待资源创建完成，再通过read函数将aws上资源状态读出设置到schema.ResourceData，最终持久化到state中。

我们知道IaC的一个核心挑战：code = cloud resource。Terraform将cloud resource状态放到state文件中，state文件作为cloud上资源的single source of truth。进而Terraform通过state和code对比，发现drift进行下一步操作。

如何避免发生drift呢？参考

read函数从cloud读出的所有attributes都存放在state中，也就是state里attributes是read下来的
create和update完后，都调用read方法，来保证state的真实性。

试想一下，如果没有read同步state与cloud资源，那create和update什么attributes，state里就是什么attributes，如果出现create的attribute和cloud上对应的attribute不一致，但是仍在create成功了。这就发生了IaC的核心挑战问题：code与cloud resource不一致。

有了drift的概念，再看一个read函数

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
func resourceAwsRDSClusterActivityStreamRead(d *schema.ResourceData, meta interface{}) error {
  conn := meta.(*AWSClient).rdsconn

  input := &rds.DescribeDBClustersInput{
    DBClusterIdentifier: aws.String(d.Id()),
  }

  log.Printf("[DEBUG] Describing RDS Cluster: %s", input)
  resp, err := conn.DescribeDBClusters(input)

  if isAWSErr(err, rds.ErrCodeDBClusterNotFoundFault, "") {
    log.Printf("[WARN] RDS Cluster (%s) not found, removing from state", d.Id())
    d.SetId("")
    return nil
  }

  if err != nil {
    return fmt.Errorf("error describing RDS Cluster (%s): %s", d.Id(), err)
  }

  if resp == nil {
    return fmt.Errorf("error retrieving RDS cluster: empty response for: %s", input)
  }

  var dbc *rds.DBCluster
  for _, c := range resp.DBClusters {
    if aws.StringValue(c.DBClusterArn) == d.Id() {
      dbc = c
      break
    }
  }

  if dbc == nil {
    log.Printf("[WARN] RDS Cluster (%s) not found, removing from state", d.Id())
    d.SetId("")
    return nil
  }

  if aws.StringValue(dbc.ActivityStreamStatus) == rds.ActivityStreamStatusStopped {
    log.Printf("[WARN] RDS Cluster (%s) Activity Stream already stopped, removing from state", d.Id())
    d.SetId("")
    return nil
  }

  d.Set("resource_arn", dbc.DBClusterArn)
  d.Set("kms_key_id", dbc.ActivityStreamKmsKeyId)
  d.Set("kinesis_stream_name", dbc.ActivityStreamKinesisStreamName)
  d.Set("mode", dbc.ActivityStreamMode)

  return nil
}

容错处理，当资源disappear时，需要标记资源被删除，即d.SetId("")。

同步state，将Response DBCluster中关于activity stream的所有attributes都保存起来。

Acceptance Tests

通过AcceptanceTest用真实的云上资源进行测试，需要满足Acceptance Test Checklists，来减少大部分bug。

因为是操作真实的aws资源，需要准备一下account credentials。通过TestSweepers保证测试后cleanup所需资源。

其中需要注意的disappear test，用来handle资源没有通过Terraform删掉，成为了dangling资源的情况。

Reference

Terraform and The Extensible Provider Architecture

Creating a Terraform Provider for Just About Anything

5 Lessons Learned From Writing Over 300,000 Lines of Infrastructure Code