本文是介绍如何给terraform provider aws新增一个resource,管理Aurora Database Activity Stream,对应的PR。
Terraform工作逻辑主要分为两部分:Terraform Core和Terraform Plugins。
Terraform Cores使用RPC与Terraform Plugins通信,HashiCorp通过go-plugin实现Golang基于RPC的插件系统。
HashiCorp已经为我们提供了terraform-plugin-sdk可以方便的实现一个Terraform Plugin。
归纳Core与Provider的主要职责:
Terraform Core:
- gRPC client
- user facing
- manage the whole resources graph
- manage state and ask providers to mutate resource state(管理资源状态 和 控制provider同步资源状态变更)
- manage providers operations (Create, Read, Update, Delete) order
Terraform Provider:
- gPRC server
- execute domain-specific logic
- Create, Read, Update, Delete, Import, Validate a Resource
- Read a Data Source
- provide resource updated state to Terraform Core
- validate domain-specific inputs and handle errors
新建resource
直接从这次PR代码开始介绍概念。
打开terraform-provider-aws的代码,main.go中直接import了aws.Provider,进而看到aws package下有很多data_source_aws_*.go
和resource_source_aws_*.go
这种pattern的文件。
这里命名的约定对应了一个Provider两种主要的components:
- Resources: represent a single AWS API object
- Data Source: fetch a AWS object data
我们需要开启的Aurora Activity Stream,本质和aws cli start-activity-stream做的事情是一样的,它是一个单独的API object,所有我们新增了Resource: resource_aws_rds_cluster_activity_stream.go
aws provider的入口aws.Provider这个函数,如下
1
2
3
4
5
6
7
8
9
| provider := &schema.Provider{
DataSourcesMap: map[string]*schema.Resource{
"aws_acm_certificate": dataSourceAwsAcmCertificate(),
},
ResourcesMap: map[string]*schema.Resource{
"aws_acm_certificate": resourceAwsAcmCertificate(),
"aws_rds_cluster_activity_stream": resourceAwsRDSClusterActivityStream(),
},
}
|
我们要做的就是在ResourcesMap里新增"aws_rds_cluster_activity_stream"和对应的*schema.Resource。
定义Schema
Terraform Schemas用来定义DataSource和Resource所需的attributes 和 behaviors,如下schema.Resource定义
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| type Resource struct {
// Schema is the schema for the configuration of this resource.
//
// The keys of this map are the configuration keys, and the values
// describe the schema of the configuration value.
//
// The schema is used to represent both configurable data as well
// as data that might be computed in the process of creating this
// resource.
Schema map[string]*Schema
// The functions below are the CRUD operations for this resource.
Create CreateFunc
Read ReadFunc
Update UpdateFunc
Delete DeleteFunc
}
|
aws_rds_cluster_activity_stream关于schema与CRUD函数定义如
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
| &schema.Resource{
Create: resourceAwsRDSClusterActivityStreamCreate,
Read: resourceAwsRDSClusterActivityStreamRead,
Delete: resourceAwsRDSClusterActivityStreamDelete,
Importer: &schema.ResourceImporter{
State: schema.ImportStatePassthrough,
},
Timeouts: &schema.ResourceTimeout{
Create: schema.DefaultTimeout(120 * time.Minute),
Delete: schema.DefaultTimeout(120 * time.Minute),
},
Schema: map[string]*schema.Schema{
"resource_arn": {
Type: schema.TypeString,
Required: true,
ForceNew: true,
ValidateFunc: validateArn,
},
"kms_key_id": {
Type: schema.TypeString,
Required: true,
ForceNew: true,
},
"mode": {
Type: schema.TypeString,
Required: true,
ForceNew: true,
ValidateFunc: validation.StringInSlice([]string{
rds.ActivityStreamModeSync,
rds.ActivityStreamModeAsync,
}, false),
},
"kinesis_stream_name": {
Type: schema.TypeString,
Computed: true,
},
},
}
|
秉承Provider Design Principles,schema.Schema中key命名与aws sdk api保持一致。
Resource所需的inputs(resource_arn, kms_key_id, mode)都是Required,这是aws sdk底层StartActivityStreamInput所决定的。
同时这些inputs也是ForceNew,原因也和aws sdk密不可分。sdk中rds activity stream只有start和stop这两种request,从aws console上也证明了activity stream目前没有update这种操作。所以,针对这种资源的更新操作,等于先delete再create。
kinesis_stream_name作为outputs也需要定义在schema.Schema中,最终会持久化到state文件里。它特殊在Computed参数,用来告诉Terraform这个属性会在创建时候计算得到。
schema.Schema中定义了一些function,可以用来控制处理attributes,如上面的ValidateFunc
定义CRUD操作
我们省去的update function,因为所有inputs都是ForceNew的。
先看一个create函数
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
| func resourceAwsRDSClusterActivityStreamCreate(d *schema.ResourceData, meta interface{}) error {
conn := meta.(*AWSClient).rdsconn
resourceArn := d.Get("resource_arn").(string)
kmsKeyId := d.Get("kms_key_id").(string)
mode := d.Get("mode").(string)
startActivityStreamInput := &rds.StartActivityStreamInput{
ResourceArn: aws.String(resourceArn),
ApplyImmediately: aws.Bool(true),
KmsKeyId: aws.String(kmsKeyId),
Mode: aws.String(mode),
}
log.Printf("[DEBUG] RDS Cluster start activity stream input: %s", startActivityStreamInput)
resp, err := conn.StartActivityStream(startActivityStreamInput)
if err != nil {
return fmt.Errorf("error creating RDS Cluster Activity Stream: %s", err)
}
log.Printf("[DEBUG]: RDS Cluster start activity stream response: %s", resp)
d.SetId(resourceArn)
err = waiter.ActivityStreamStarted(conn, d.Id(), d.Timeout(schema.TimeoutCreate))
if err != nil {
return err
}
return resourceAwsRDSClusterActivityStreamRead(d, meta)
}
|
函数内容就是call aws sdk api,然后就不断retry等待资源创建完成,再通过read函数将aws上资源状态读出设置到schema.ResourceData,最终持久化到state中。
我们知道IaC的一个核心挑战:code = cloud resource。Terraform将cloud resource状态放到state文件中,state文件作为cloud上资源的single source of truth。进而Terraform通过state和code对比,发现drift进行下一步操作。
如何避免发生drift呢?参考
- read函数从cloud读出的所有attributes都存放在state中,也就是state里attributes是read下来的
- create和update完后,都调用read方法,来保证state的真实性。
试想一下,如果没有read同步state与cloud资源,那create和update什么attributes,state里就是什么attributes,如果出现create的attribute和cloud上对应的attribute不一致,但是仍在create成功了。这就发生了IaC的核心挑战问题:code与cloud resource不一致。
有了drift的概念,再看一个read函数
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
| func resourceAwsRDSClusterActivityStreamRead(d *schema.ResourceData, meta interface{}) error {
conn := meta.(*AWSClient).rdsconn
input := &rds.DescribeDBClustersInput{
DBClusterIdentifier: aws.String(d.Id()),
}
log.Printf("[DEBUG] Describing RDS Cluster: %s", input)
resp, err := conn.DescribeDBClusters(input)
if isAWSErr(err, rds.ErrCodeDBClusterNotFoundFault, "") {
log.Printf("[WARN] RDS Cluster (%s) not found, removing from state", d.Id())
d.SetId("")
return nil
}
if err != nil {
return fmt.Errorf("error describing RDS Cluster (%s): %s", d.Id(), err)
}
if resp == nil {
return fmt.Errorf("error retrieving RDS cluster: empty response for: %s", input)
}
var dbc *rds.DBCluster
for _, c := range resp.DBClusters {
if aws.StringValue(c.DBClusterArn) == d.Id() {
dbc = c
break
}
}
if dbc == nil {
log.Printf("[WARN] RDS Cluster (%s) not found, removing from state", d.Id())
d.SetId("")
return nil
}
if aws.StringValue(dbc.ActivityStreamStatus) == rds.ActivityStreamStatusStopped {
log.Printf("[WARN] RDS Cluster (%s) Activity Stream already stopped, removing from state", d.Id())
d.SetId("")
return nil
}
d.Set("resource_arn", dbc.DBClusterArn)
d.Set("kms_key_id", dbc.ActivityStreamKmsKeyId)
d.Set("kinesis_stream_name", dbc.ActivityStreamKinesisStreamName)
d.Set("mode", dbc.ActivityStreamMode)
return nil
}
|
容错处理,当资源disappear时,需要标记资源被删除,即d.SetId("")。
同步state,将Response DBCluster中关于activity stream的所有attributes都保存起来。
Acceptance Tests
通过AcceptanceTest用真实的云上资源进行测试,需要满足Acceptance Test Checklists,来减少大部分bug。
因为是操作真实的aws资源,需要准备一下account credentials。通过TestSweepers保证测试后cleanup所需资源。
其中需要注意的disappear test,用来handle资源没有通过Terraform删掉,成为了dangling资源的情况。
Reference
Terraform and The Extensible Provider Architecture
Creating a Terraform Provider for Just About Anything
5 Lessons Learned From Writing Over 300,000 Lines of Infrastructure Code