
This article outlines the system design for automating the banking reconciliation process by migrating existing manual tasks to AWS. The solution leverages various AWS services to create a scalable, secure, and efficient system. The goal is to reduce manual effort, minimize errors, and enhance operational efficiency within the financial reconciliation workflow.
Key Objectives:
- Develop a user-friendly custom interface for managing reconciliation tasks.
- Utilize AWS services like Lambda, Glue, S3, and EMR for data processing automation.
- Implement robust security and monitoring mechanisms to ensure system reliability.
- Provide post-deployment support and monitoring for continuous improvement.
Architecture Overview
The architecture comprises several AWS services, each fulfilling specific roles within the system, and integrates with corporate on-premises resources via Direct Connect.
- Direct Connect: Securely connects the corporate data center to the AWS VPC, enabling fast and secure data transfer between on-premises systems and AWS services.
Data Ingestion
- Amazon S3 (Incoming Files Bucket): Acts as the primary data repository where incoming files are stored. The bucket triggers the Lambda function when new data is uploaded.
- Bucket Policy: Ensures that only authorized services and users can access and interact with the data stored in S3.
Event-Driven Processing
- AWS Lambda: Placed in a private subnet, this function is triggered by S3 events (e.g., file uploads) and initiates data processing tasks.
- IAM Permissions: Lambda has permissions to access the S3 bucket and trigger the Glue ETL job.
Data Transformation
- AWS Glue ETL Job: Handles the extraction, transformation, and loading (ETL) of data from the S3 bucket, preparing it for further processing.
- NAT Gateway: Located in a public subnet, the NAT Gateway allows the Lambda function and Glue ETL job to access the internet for downloading dependencies without exposing them to inbound internet traffic.
Data Processing and Storage
- Amazon EMR: Performs complex transformations and applies business rules necessary for reconciliation processes, processing data securely within the private subnet.
- Amazon Redshift: Serves as the central data warehouse where processed data is stored, facilitating further analysis and reporting.
- RDS Proxy: Manages secure and efficient database connections between Glue ETL, EMR, and Redshift.
Business Intelligence
- Amazon QuickSight: A visualization tool that provides dashboards and reports based on the data stored in Redshift, helping users to make informed decisions.
User Interface
- Reconciliation UI: Hosted on AWS and integrated with RDS, this custom UI allows finance teams to manage reconciliation tasks efficiently.
- Okta SSO: Manages secure user authentication via Azure AD, ensuring that only authorized users can access the reconciliation UI.
Orchestration and Workflow Management
- AWS Step Functions: Orchestrates the entire workflow, ensuring that each step in the reconciliation process is executed in sequence and managed effectively.
- Parameter Store: Holds configuration data, allowing dynamic and flexible workflow management.
Security and Monitoring
- AWS Secrets Manager: Securely stores and manages credentials needed by various AWS services.
- Monitoring and Logging:
- Scalyr: Provides backend log collection and analysis, enabling visibility into system operations.
- New Relic: Monitors application performance and tracks key metrics to alert on any issues or anomalies.
Notifications
- AWS SNS: Sends notifications to users about the status of reconciliation tasks, including completions, failures, or other important events.
Security Considerations
Least Privilege Principle:
All IAM roles and policies are configured to ensure that each service has only the permissions necessary to perform its functions, reducing the risk of unauthorized access.
Encryption:
Data is encrypted at rest in S3, Redshift, and in transit, meeting compliance and security standards.
Network Security:
The use of private subnets, security groups, and network ACLs ensures that resources are securely isolated within the VPC, protecting them from unauthorized access.
Code Implementation
Below are the key pieces of code required to implement the Lambda function and the CloudFormation template for the AWS infrastructure.
Lambda Python Code to Trigger Glue
Here’s a Python code snippet that can be deployed as part of the Lambda function to trigger the Glue ETL job upon receiving a new file in the S3 bucket:
import json
import boto3
import logging
# Set up logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# Initialize the Glue and S3 clients
glue_client = boto3.client('glue')
s3_client = boto3.client('s3')
def lambda_handler(event, context):
"""
Lambda function to trigger an AWS Glue job when a new file is uploaded to S3.
"""
try:
# Extract the bucket name and object key from the event
bucket_name = event['Records'][0]['s3']['bucket']['name']
object_key = event['Records'][0]['s3']['object']['key']
# Log the file details
logger.info(f"File uploaded to S3 bucket {bucket_name}: {object_key}")
# Define the Glue job name
glue_job_name = "your_glue_job_name"
# Start the Glue job with the required arguments
response = glue_client.start_job_run(
JobName=glue_job_name,
Arguments={
'--s3_input_file': f"s3://{bucket_name}/{object_key}",
'--other_param': 'value' # Add any other necessary Glue job parameters here
}
)
# Log the response from Glue
logger.info(f"Started Glue job: {response['JobRunId']}")
except Exception as e:
logger.error(f"Error triggering Glue job: {str(e)}")
raise e
The Lambda function code is structured as follows:
- Import Libraries: Imports necessary libraries like
json,boto3, andloggingto handle JSON data, interact with AWS services, and manage logging. - Set Up Logging: Configures logging to capture
INFOlevel messages, which is crucial for monitoring and debugging the Lambda function. - Initialize AWS Clients: Initializes
GlueandS3clients usingboto3to interact with these AWS services. - Define Lambda Handler Function: The main function,
lambda_handler(event, context), serves as the entry point and handles events triggered by S3. - Extract Event Data: Retrieves the S3 bucket name (
bucket_name) and object key (object_key) from the event data passed to the function. - Log File Details: Logs the bucket name and object key of the uploaded file to help track what is being processed.
- Trigger Glue Job: Initiates a Glue ETL job using
start_job_runwith the S3 object passed as input, kicking off the data transformation process. - Log Job Run ID: Logs the Glue job’s
JobRunIdfor tracking purposes, helping to monitor the job’s progress. - Error Handling: Catches and logs any exceptions that occur during execution to ensure issues are identified and resolved quickly.
- IAM Role Configuration: Ensures the Lambda execution role has the necessary permissions (
glue:StartJobRun,s3:GetObject, etc.) to interact with AWS resources securely.
CloudFormation Template
Below is the CloudFormation template that defines the infrastructure required for this architecture:
AWSTemplateFormatVersion: '2010-09-09'
Description: CloudFormation template for automating banking reconciliation on AWS
Resources:
S3Bucket:
Type: AWS::S3::Bucket
Properties:
BucketName: !Sub 'banking-reconciliation-bucket-${AWS::AccountId}'
AccessControl: Private
VersioningConfiguration:
Status: Enabled
LambdaExecutionRole:
Type: AWS::IAM::Role
Properties:
RoleName: lambda-glue-execution-role
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: lambda.amazonaws.com
Action: sts:AssumeRole
Policies:
- PolicyName: lambda-glue-policy
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- glue:StartJobRun
- glue:GetJobRun
- s3:GetObject
- s3:PutObject
Resource: "*"
LambdaFunction:
Type: AWS::Lambda::Function
Properties:
FunctionName: trigger-glue-job
Handler: index.lambda_handler
Role: !GetAtt LambdaExecutionRole.Arn
Code:
ZipFile: |
import json
import boto3
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
glue_client = boto3.client('glue')
s3_client = boto3.client('s3')
def lambda_handler(event, context):
try:
bucket_name = event['Records'][0]['s3']['bucket']['name']
object_key = event['Records'][0]['s3']['object']['key']
logger.info(f"File uploaded to S3 bucket {bucket_name}: {object_key}")
glue_job_name = "your_glue_job_name"
response = glue_client.start_job_run(
JobName=glue_job_name,
Arguments={
'--s3_input_file': f"s3://{bucket_name}/{object_key}",
'--other_param': 'value'
}
)
logger.info(f"Started Glue job: {response['JobRunId']}")
except Exception as e:
logger.error(f"Error triggering Glue job: {str(e)}")
raise e
Runtime: python3.8
Timeout: 60
S3BucketNotification:
Type: AWS::S3::BucketNotification
Properties:
Bucket: !Ref S3Bucket
NotificationConfiguration:
LambdaConfigurations:
- Event: s3:ObjectCreated:*
Function: !GetAtt LambdaFunction.Arn
GlueJob:
Type: AWS::Glue::Job
Properties:
Name: your_glue_job_name
Role: !GetAtt LambdaExecutionRole.Arn
Command:
Name: glueetl
ScriptLocation: !Sub 's3://${S3Bucket}/scripts/glue_etl_script.py'
PythonVersion: '3'
DefaultArguments:
--job-bookmark-option: job-bookmark-disable
MaxRetries: 1
ExecutionProperty:
MaxConcurrentRuns: 1
GlueVersion: "2.0"
Timeout: 2880
Outputs:
S3BucketName:
Description: "Name of the S3 bucket created for incoming files"
Value: !Ref S3Bucket
Export:
Name: S3BucketName
LambdaFunctionName:
Description: "Name of the Lambda function that triggers the Glue job"
Value: !Ref LambdaFunction
Export:
Name: LambdaFunctionName
GlueJobName:
Description: "Name of the Glue job that processes the incoming files"
Value: !Ref GlueJob
Export:
Name: GlueJobName
This CloudFormation template sets up the following resources:
- S3 Bucket: For storing incoming files that will trigger further processing.
- Lambda Execution Role: An IAM role with the necessary permissions for the Lambda function to interact with S3 and Glue.
- Lambda Function: The function that is triggered when a new object is created in the S3 bucket, which then triggers the Glue ETL job.
- S3 Bucket Notification: Configures the S3 bucket to trigger the Lambda function when a new file is uploaded.
- Glue Job: Configures the Glue ETL job that processes the incoming data.
This system design article outlines a comprehensive approach to automating banking reconciliation processes using AWS services.
In Plain English 🚀
Thank you for being a part of the In Plain English community! Before you go:
- Be sure to clap and follow the writer ️👏️️
- Follow us: X | LinkedIn | YouTube | Discord | Newsletter
- Visit our other platforms: CoFeed | Differ
- More content at PlainEnglish.io

