Cloud Infrastructure Platform for Enterprise Data Management.
Background
The organization needed a robust, scalable, and automated cloud infrastructure platform to support Enterprise Data Management workloads, including MDM, data quality, integration, and analytics.
Rather than building isolated environments for each project, the goal was to design a reusable infrastructure platform that could:
- Support multiple data domains (MDM, DQ, ETL, EDW)
- Scale dynamically with workload demands
- Ensure consistent deployments across environments
- Be fully automated through CI/CD and Infrastructure-as-Code
The solution was designed as an
Enterprise Data Management (EDM) Platform, where
Informatica SaaS provides the central data management capabilities, and
AWS serves as the backbone for runtime, storage, security, and supporting platform services. The full platform is deployed and governed through automated pipelines in
Azure DevOps.
Objectives
- Design standardized infrastructure for Dev, Test, UAT, and Production
- Implement Infrastructure-as-Code (IaC)
- Enable fully automated CI/CD pipelines
- Ensure secure and parameterized deployments
- Optimize compute sizing and cost efficiency
- Provide a stable runtime platform for data services (e.g.,
- Informatica Secure Agents, Connections, Schedulers, and Access Management through groups and roles)
Solution
1. Cloud Compute Architecture
The EDM platform is centered on Informatica SaaS, with AWS providing the operational backbone for runtime services, storage, identity, and extensibility.
· Amazon EC2 instances host runtime components such as Informatica Secure Agents and related execution services. Their configuration is automated, and the platform team defines sizing and instance counts based on the workload and EDM use cases being supported.
· Amazon S3 provides the primary storage foundation for platform data exchange and persistence.
· AWS IAM enforces secure access and role-based integration across platform components.
· Additional AWS services such as DynamoDB, SNS/SQS, and Athena can be incorporated for specific EDM use cases where additional eventing, metadata, query, or integration capabilities are required.
· Instance types are environment-specific (e.g., smaller for Dev, scaled for Production).
· Horizontal scaling strategy allows adding instances when workload increases.
2. Storage and Data Layer Foundation
Amazon S3 serves as the centralized storage backbone.
- Raw zone
- Staging zone
- Curated zone
· Additional AWS data services such as Redshift, Athena, DynamoDB, and messaging components can be provisioned as needed to support analytics, persistence, and event-driven EDM scenarios.
All infrastructure components are provisioned through Infrastructure-as-Code templates to ensure repeatability, consistency, and easier maintenance.
3. Infrastructure as Code (IaC)
All core EDM platform components are defined declaratively through Infrastructure-as-Code, covering both the AWS backbone and the associated Informatica platform configuration.
· EC2 configuration automation
· IAM roles and policies
· S3 bucket structure
· Redshift clusters
· Networking configurations
· Informatica Runtime Environments (Secure Agents)
· Informatica Connections
· Informatica Schedulers
· Informatica Access Management, including groups and roles
For compute resources, the team contributes to architecture decisions on instance sizing and the number of EC2 instances required, based on workload characteristics and the EDM use cases being supported.
Key principles:
- No manual console deployments
- Environment variables separated from templates
- Version-controlled infrastructure definitions
- Repeatable deployments across environments
This approach ensures that Dev, Test, and Production remain structurally identical, differing only in configuration parameters.
4. CI/CD Automation
CI/CD pipelines were implemented using Azure DevOps YAML pipelines.
Pipeline capabilities include end-to-end deployment and configuration automation for both infrastructure and platform services using Azure DevOps YAML pipelines.
· Automated AWS infrastructure deployment through Infrastructure-as-Code
· Provisioning and deployment of Informatica Runtime Environments (Secure Agents)
· Automated deployment of Informatica Connections across environments
· Automated setup and promotion of Informatica Schedulers
· Automated Informatica Access Management configuration, including groups and roles
· Parameter-driven environment selection
· Promotion across environments (Dev → Test → UAT → Prod)
· Execution of validation scripts
· Controlled release approvals for higher environments
Branching strategy:
- Feature branches → Development
- Development → Test
- Test → Release
Pipelines dynamically inject:
- Environment variables
- AWS credentials (secure libraries)
- Deployment parameters
This eliminates manual configuration drifts and improves traceability and auditability.
This approach ensures that Informatica platform components are promoted consistently alongside the underlying AWS infrastructure, with reusable pipeline templates governing agent deployment, connection configuration, scheduler setup, and role-based access control through standardized groups and roles.
5. Parameterization Strategy
One of the key architectural strengths was deep parameterization:
- Instance sizing per environment
- S3 bucket prefixes
- Redshift cluster size
- IAM role mappings
- Network configuration
- Agent runtime configurations
Deployment scripts contain no hardcoded values.
This allows:
- Rapid environment cloning
- Predictable scaling
- Reduced risk during promotion
Challenges
Environment Drift
Early manual adjustments caused inconsistencies. Strict IaC enforcement resolved this.
Secrets & Credential Management
Managing secure AWS access from pipelines required structured library usage and role-based access.
Scaling Strategy
Determining when to scale vertically versus horizontally required workload benchmarking.
Cost Governance
Production required high availability, while non-production needed cost optimization mechanisms.
Release Governance
Introducing automated approvals while maintaining agility required careful pipeline design.
Results
- 100% infrastructure deployed via code
- Zero manual console configuration
- Environment provisioning time reduced from days to hours
- Deployment errors significantly reduced
- Predictable cost management across environments
- Platform capable of supporting multiple data initiatives simultaneously with standardized Azure DevOps pipeline deployment for infrastructure and Informatica platform components
- Informatica Runtime Environments, Connections, Schedulers, and Access Management configurations are deployed consistently through code and pipeline-driven releases
The solution now operates as a reusable Enterprise Data Management Platform
in which Informatica SaaS delivers core MDM, data quality, and integration capabilities, while AWS provides the backbone through S3, IAM, runtime services, and extensible components for use-case-specific needs. Together, these services form a standardized and automated foundation for enterprise data initiatives.
Lessons Learned
Infrastructure Must Be Treated as a Product
Standardization and documentation are essential.
Parameterization is Key to Scalability
Hardcoding leads to long-term instability.
CI/CD is Not Optional
Manual deployments introduce drift and risk.
Non-Prod Cost Optimization is Critical
Auto-stop/start policies significantly reduce operational cost.
Platform Thinking Enables Growth
Building reusable infrastructure accelerates future projects.
Conclusion
This case study describes the design of a reusable and automated Enterprise Data Management (EDM) Platform that supports multiple enterprise data capabilities, including Master Data Management (MDM), Data Quality (DQ), integration, and analytics. Instead of creating isolated technical environments for each initiative, the platform was designed as a standardized foundation that can support a wide range of use cases through a consistent delivery model.
At the center of the platform is Informatica SaaS, which provides the core data management capabilities. AWS serves as the underlying backbone for runtime support, storage, access control, and supporting technical services. Core AWS components include Amazon S3 for storage and AWS IAM for secure access management, while compute services such as Amazon EC2 host runtime components like Informatica Secure Agents. Additional AWS services such as DynamoDB, SNS/SQS, and Athena can be introduced depending on the specific EDM use cases being supported.
A key characteristic of the solution is the strong emphasis on automation. The platform is implemented using Infrastructure as Code (IaC) and Azure DevOps YAML pipelines, enabling repeatable, parameter-driven deployments across development, test, UAT, and production environments. The automation scope includes AWS infrastructure components as well as Informatica platform elements such as Runtime Environments (Secure Agents), Connections, Schedulers, and Access Management through groups and roles.
From a compute perspective, the platform team automates EC2 configuration and contributes to architecture decisions regarding instance sizing and the number of instances required. This ensures that the runtime layer is aligned with the business and technical workloads that each EDM use case must support, while maintaining a high degree of standardization and operational consistency.
The result is a scalable, secure, and reusable platform that reduces manual effort, improves deployment consistency, shortens environment delivery timelines, and enables multiple enterprise data initiatives to run on a common technical foundation. By combining Informatica SaaS, AWS backbone services, and Azure DevOps automation into a unified operating model, the organization established an EDM platform that can grow with evolving business and data management requirements.

