Use redshift-data api with AWS Glue Python Shell job

AWS Glue is a fully managed extract, transform, and load (ETL) service to process large amount of datasets from various sources for analytics and data processing. When you add a AWS Glue job, you can choose the job to be either Spark or Spark Streaming or Python shell type.

For one of my use-case I wanted to try the new “ redshift-data “ api service in AWS Glue Python Shell script. The Amazon Redshift Data API can be used to run SQL queries on Amazon Redshift tables. To test the redshift-data API I wrote a simple AWS Glue Python Shell job to execute a sql. The job failed with

UnknownServiceError: Unknown service: 'redshift-data'. Valid service names are: acm...

I was quite surprised with the error and it made me think to check the Boto3 version which AWS Glue Python Shell script is importing.

import boto3 print('boto3 version :', boto3.__version__)

Surprisingly the output is -

boto3 version : 1.9.203

The current boto3 version available is 1.16.45 whereas AWS Glue Python shell is importing version 1.9.203. I was pretty sure I wasn’t the first person to get into this issue, so I searched it and came across the open issue.

User sarath-mec has provided the solution in the comment section for both AWS Glue Python Shell with and without Internet.

Here for my notes I am simply pasting the solution for AWS Glue Python Shell with Internet with minor changes -

AWS Glue Python Shell with Internet

Add awscli and boto3 whl files to Python library path during Glue Job execution. This option is slow as it has to download and install dependencies.

  1. Upload the files to s3 bucket.
  2. Add the s3 whl file paths in the Python library path. Give the entire whl file s3 referenced path separated by comma.
  3. Add the following code snippet to load the new files.
#Additonal code as part of AWS Thread https://forums.aws.amazon.com/thread.jspa?messageID=954344

Script log -

boto3 version : 1.16.45 

Originally published at http://aprakash.wordpress.com on December 30, 2020.

Avid learner of technology solutions around databases, big-data, Machine Learning. 5x AWS Certified | 5x Oracle Certified. Connect on Twitter @anandp86