Chemical Company Reduces Cost and Boosts Productivity with AWS Data Lake.

The Challenge

A global chemical and ingredients distributor found it challenging to store and analyze data because of its legacy data storage platform. Other problems with legacy data storage platform include the inability to process real-time data, sequential data hops, data duplication and lack of uniform data standards across multiple divisions within the company. The client needed an efficient data storage platform that offers more agility and flexibility than traditional data management systems. The client approached iLink because of our prior success in delivering similar projects, technology expertise, and understanding of the industry landscape.

iLink’s Approach

iLink’s experts interviewed key stakeholders from business and technology teams to get a holistic understanding of the process and their pain points.

After extensive research, focusing on key metrics – cost reduction, efficiency, productivity and customer acquisition/retention, iLink proposed AWS data lake solution to the client.

Technical Details:

Below are the technical architectural details of the AWS data lake implementation:

  • S3 acts as the data hub for serving data
  • Data Catalog crawls S3 objects to generate schema definitions and integrates with EMR, Athena and Redshift Spectrum
  • DynamoDB stores S3 object index values as well as processes control metadata
  • Lambda and EMR serves as the data integration layer to serve data to data marts like RDS, Aurora, SQL Server, Redshift, SFDC
  • Athena used for S3 in-place queries
  • S3 data is organized and readily available for easy access to Machine Learning and SageMaker
  • CloudWatch and CloudTrail for audit and logging

The Outcome

Reduced costs, improved efficiency and customer acquisition/retention for the client

Our proposed AWS data lake solution reduced costs, boosted productivity and enabled following new capabilities to the client:

  • Provided a single source of truth for all data needs, tighter data Integrity, improved accuracy, and reduced data redundancy
  • Ability to scale to high data volumes in a cost-effective manner
  • supported different types of analytics such as machine learning, ad-hoc queries, big data analytics, full text search, and real-time analytics over multiple data sources stored in the data lake
  • Allowed the client to generate effective insights including reporting on historical data, and predictive analytics through machine learning models to provide forecasting, recommendations, etc.
  • Allowed various roles within the organization such as data scientists, data analysts, and business analysts to access data with their choice of analytic tools