PyTorch and fastai have two main classes for representing and accessing a training set or validation set:

  • DataLoader:: An iterator that provides a stream of mini-batches, where each mini-batch is a tuple of a batch of independent variables and a batch of dependent variables

On top of these, fastai provides two classes for bringing your training and validation sets together:

  • DataLoaders:: An object that contains a training DataLoader and a validation…

Photo by Nam Anh on Unsplash

In this blog post I will cover what is JSON data type, what options does PostgreSQL offers to store JSON data, how you can create AWS Glue connection to Aurora PostgreSQL database running in private subnet and how can you then use AWS Glue to write data into table with JSONB datatype into Aurora/RDS PostgreSQL database.

JSON (JavaScript Object Notation) is a format to store data. The data is stored in key/value or you can say name/value pairs. The below object defines a data object with key as “data” and rest as its value -


Image from

Metric is to drive human understanding and the loss is to drive automated learning.

Stochastic Gradient Descent –

As Arthur Samuel had mentioned the description of machine learning

Suppose we arrange for some automatic means of testing the effectiveness of any current weight assignment in terms of actual performance and provide a mechanism for altering the weight assignment so as to maximize the performance. We need not go into details of such a procedure to see that is could be name entirely automatic and to see that a machine so programmed would “learn” from its experience.

Instead of trying to find the similarity between…

Image by Jon Stewart from Pixabay

Chapter 1 Notes —

I want to start this blog with some of my personal notes before jumping on the book.

2. What is a norm?

A norm is a measure of distance and have three properties

  • Distances multiply with scalar multiplication : ||av|| = |a|.||v||
  • If I travel from A to B then B to C that is at least as far as going from A to C: || v + w || =< ||v||…

Image by Free-Photos from Pixabay

For quite some time I have been thinking to start writing about data science and machine learning, but somehow it wasn’t happening. Recently, I joined 25 weeks long fastbook reading session organized by Weights & Biases and lead by Aman Arora ( awesome instructor 👏). This is my first time participating in a reading session and it has been amazing experience. There is so much participation during and after the lecture that it makes learning fun

I would also like to thank Rachel Thomas, Jeremy Howard, and Sylvain Gugger for their contribution at building community around AI and for

Image by Tim Hill from Pixabay

In this blog post I discuss how to export 100GB non-partitioned table from Aurora PostgreSQL to Amazon S3. I will walk you through two approaches that you can use to export the data. Firstly I will demonstrate using aws_s3, a PostgreSQL extension which Aurora PostgreSQL provides and then using AWS Glue service. The post also covers the performance and scaling challenges when exporting the table using AWS Glue.

Basic Information

  • Database Details — I am running PostgreSQL 12.4 version on Aurora 4.0.2 version. The writer instance is running on db.r5.2xlarge having 8 vCPUs, 2 threads per code and 64Gb memory.

AWS Amplify

Disclaimer: I am not a web developer and I do not have knowledge of JavaScript. The code snippet is mostly taken from different sources and tweaked for my specific use case. :)

I was looking for a way to enable a user to upload bunch of small size (< 10 MB) csv files to a specific S3 bucket. Exploring the ways I came across AWS Amplify service and I felt AWS Amplify makes it easy to deploy a full-stack web app for someone like me who does not have much knowledge of web app development. …

Image by Hebi B. from Pixabay

Data transformation is an important aspect of Data Engineering and can be a challenging task depending on the dataset and the transformation requirements. A bug in data transformation can have a severe impact on the final data set generated leading to data issues. In this blog I am going to share my experience of having missing values in Pandas DataFrame, handling these missing values in Pandas and converting the Pandas DataFrame to Spark DataFrame.

To give a quick background, I was writing a data transformation (ETL) job in AWS Glue using PySpark which was to be executed every 15mins. The…

Image by GraphicMama-team from Pixabay

AWS Glue is a serverless ETL service to process large amount of datasets from various sources for analytics and data processing. Recently I came across “CSV data source does not support map data type” error for a newly created glue job. In a nutshell, the job was performing below steps:

  1. Perform some required transformations
  2. Write the transformed data to Amazon Redshift using write_dynamic_frame_from_jdbc_conf

And it was during this write step that the glue job was failing. Lets look into it in little more details -

datasource0 = glueContext.create_dynamic_frame_from_options(…

Image by Peggy und Marco Lachmann-Anke from Pixabay

In this blog post I will discuss following scenarios to connect to databases from AWS Lambda function:

  • Connecting to cross account Amazon Redshift database in public subnet with public accessibility set to Yes.

Connect to Amazon Aurora PostgreSQL database in Private subnet with Public accessibility set to No in the same AWS account

In this setup, Amazon Aurora PostgreSQL database is running in private subnet with public accessibility set to No. The connectivity and security detail are as follows:

Anand Prakash

Avid learner of technology solutions around databases, big-data, Machine Learning. 5x AWS Certified | 5x Oracle Certified. Connect on Twitter @anandp86

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store