Monday, November 25, 2024

Establishing a Secure Foundation for Kupala-Nich



Establishing a Secure Foundation for Kupala-Nich

M 917 536 3378
maksim_kozyarchuk@yahoo.com



    Establishing a secure foundation for the Kupala-Nich(https://kupala-nich.com) application is a critical step in transforming it from a demo to a multi-tenant application. This article outlines the security measures in place, focusing on access control, data protection, and API Gateway security. Feedback and critique are welcome to ensure these measures remain effective and resilient against emerging threats. Maintaining the integrity and confidentiality of the system is my top priority.


1. Application Access Security

Securing access to the Kupala-Nich application is the first line of defense. The following practices ensure robust access control:

  • Multi-Factor Authentication (MFA) is required for all AWS accounts accessing Kupala-Nich, reducing the risk of unauthorized access.

  • Deployment Management is handled through OpenID Connect (OIDC), which ties deployments to specific roles and avoids the need to to create API access keys.

  • Least Privilege Principle is applied to all Lambda functions, granting them only the permissions needed for their specific tasks.

  • AWS CloudShell is used for administrative access, eliminating the need for static access keys.

Dynamic Policies

Lambda permissions are defined via CloudFormation templates, which are managed in code. Regular policy reviews are planned to maintain alignment with best practices and prevent privilege escalation.


2. Securing Data in AWS

The Kupala-Nich application stores all sensitive data in DynamoDB, leveraging its robust security features. DynamoDB ensures that Data at Rest and in Transit is encrypted automatically,  meeting strict encryption compliance and regulatory requirements.

Position and Transaction Data

After evaluating addition additional encryption layers for position and transaction data I decided against it for the following reasons:

  • Existing Security: DynamoDB already provides a secure storage solution.

  • Development Complexity: Encrypting individual fields would require converting native DynamoDB data types into encrypted strings, complicating development and debugging processes.

  • Performance Impact: Decrypting data on demand would introduce latency, degrading the user experience.

  • Cost: Frequent encryption and decryption operations with AWS Key Management Service (KMS) would lead to significant costs.

PII and Other Highly Sensitive Data

While encryption secures the data within AWS, it does not protect against misuse of compromised accounts or roles. For highly sensitive data, such as personally identifiable information (PII) or vendor credentials, an additional encryption layer will be implemented

Controlled Data Access

Access to data is only available through API Gateways with built-in authentication and authorization, ensuring users can retrieve only the data they are authorized to access. More details on this are covered in the next section.

Private Environments

For users requiring additional security guarantees, private environments can be created. This builds on the existing separation of production and development environments to enhance data isolation.


3. Securing API Gateway Access( Under Construction )

Both WebSocket (WS) and HTTP API Gateways manage user interactions with the Kupala-Nich application. Several layers of security will ensure the protection of data:

Authentication with Cognito

  • Users authenticate via AWS Cognito, which issues JSON Web Tokens (JWTs) for session management.

  • During WebSocket $connect, JWTs are validated, and a mapping is established between the connection ID and the authenticated user.

  • Subsequent interactions within the session/connection use the established authentication.

  • All communication through WS and HTTP API Gateways is secured with SSL.

Authorization based on data ownership

  • Portfolios, datasets, and other resources are tied to user ownership, enabling simple checks to determine whether a user is authorized to access specific data.

  • Lambda functions ensure that only authorized data is returned, relying on defined schemas that include user IDs for validation.

  • Queries to DynamoDB are structured such that either the sort key or an index is based on the user ID, making it natural to retrieve data for authorized users only.

  • Automated tests validate API endpoints to ensure appropriate data is returned to user.

Additional enhancements

API Gateway will not initially require API keys. However, API keys may be introduced in the future to enable throttling and rate-limiting of API calls, adding an extra layer of security on top of JWT-based authorization.


Combining Security Measures

Kupala-Nich employs a combination of application access security, data security, and API security to provide a comprehensive approach to safeguarding the application and its data.

Feedback and critique are encouraged to ensure these measures remain effective and resilient against emerging threats. As the implementation of API Gateway security is further developed, this document will be updated with additional details and implementation specifics.



Sunday, November 17, 2024

Connecting GitLab and AWS

 


Connecting Gitlab and AWS

M 917 536 3378
maksim_kozyarchuk@yahoo.com








     Bypassing the GitLab vs GitHub or GitLab vs AWS Code Pipeline debates, this article will focus on logistics of connecting GitLab that acts as CI/CD and deploys to AWS.  One way to achieve this is to add AWS access keys as variables on GitLab, while this works it’s not a recommended practice from security perspective.  It’s beyond the scope of the article to expand on this, I would just add that the accounts/roles that are needed for AWS deployment are typically quite powerful and should they get compromised, damage is likely to be extensive.  

The recommended option is to connect AWS with GitLab using the OpenID protocol.  With this setup, GitLab acts as both an OpenID Authenticator and initiator of the authentication commands.  AWS is then configured to trust GitLab as Authenticator for a specific role, guarded by three aspects:

  • requests are coming from specific instance of gitlab that is configured for the role

  • requests are coming from a particular group, project or branch

  • Optionally by a secret key also known as Audience, this is advertised by AWS as the key aspect of security, but while supported by GitLab it can be easily spoofed.   This point will cause quite a bit of confusion in getting this handshake setup, if you follow the official docs and forums on AWS.

There are generally 5 steps in getting this setup configured and I would recommend you take a deep breath and allocate a good portion of a day to get through them.

Step 1. Optional in practice, but appears crucial from reading relevant articles is to establish ClientId or Audience.  To start I would highly recommend you read this AWS article, while it’s misleading about a couple of points, namely that “ you must register your application with the IdP to receive a client ID.” and that /.well-known/openid-configuration must support id_token, it’s generally a well article that gives you a good overview of the process.    If you want to go ahead with creation of ClientID, you can do so by logging on to your gitlab instance, then going to your group level, then Settings->Application and create a new application.  The one confusing point you’ll encounter is entering a value of a callback url, feel free to enter https://localhost as it will not be used.  Reference document from GitLab can be found here. https://docs.gitlab.com/ee/integration/oauth_provider.html


Step 2: Add new Identity Provider on AWS IAM.  I would refer you back to the article reference in Step 1 as it provides a reasonable description of the process.  A few points on this doc is (Step 1 as before is optional) and if you skip it, then for step 6( Audience) you can just add any value, it’s simply there to create a placeholder policy that you will then edit via json.

Step 3: Finish setting up AWS role.  With step 2, you created an AWS Role, but that role relies on Audience which with GitLab is not much more secure than storing AWS Access Keys as GitLab variables.  To secure it further you should restrict access to particular group, projects or branches.  I would refer you to this article on gitlab to learn more.  Below is a sample AWS Role setup you will end up with.  If you setup clientid/audience in Step 1, you can keep it using the template below or you can simply skip it.  In below example, client id is, group is ‘superproj’ hosted on gitlab.com and all repos and branches within the group can assume AWS role.

"Action": "sts:AssumeRoleWithWebIdentity",

"Condition": {

    "StringEquals": {

        "gitlab.com:aud": "My Favorite Client"

    },

    "StringLike": {

        "gitlab.com:sub": "project_path:superpoj/*:ref_type:branch:ref:*"

    }

}


Step 4: Getting your .gitlab-ci.yml setup.  The https://docs.gitlab.com/ee/ci/cloud_services/aws/ offer an example for setting up the pipeline, but I found a few issues with it.

  • It doesn’t talk about which image to use, I found another doc from gitlab that provides relevant answer. https://docs.gitlab.com/ee/ci/cloud_deployment/.  

  • The parsing logic to assign response of aws sts assume-role-with-web-identity, instead I used the following.

  • It suggests you ‘aud’ section of GITLAB_OIDC_TOKEN to https://gitlab.com, this does’t work, instead you need to set it to whatever your AWS policy is set to or ‘My Favoritie Client’ sticking with the example I started.

I found the below template to work for me.  jq is included in the aws base image, but if you need to add it to your image it’s not a big deal as well.

  image: public.ecr.aws/sam/build-python3.11

  id_tokens:

    GITLAB_OIDC_TOKEN:

      aud: “My My Favorite Client”

  variables:

    ROLE_ARN: "arn:aws:iam::<your account>:role/<NameOfYourRole>"

  script:

    - CREDS=$(aws sts assume-role-with-web-identity --role-arn ${ROLE_ARN} --role-session-name "GitLabRunner-${CI_PROJECT_ID}-${CI_PIPELINE_ID}" --web-identity-token ${GITLAB_OIDC_TOKEN} --duration-seconds 3600 --query 'Credentials.[AccessKeyId,SecretAccessKey,SessionToken]' --output json)

    - export AWS_ACCESS_KEY_ID=$(echo $CREDS | jq -r '.[0]')

    - export AWS_SECRET_ACCESS_KEY=$(echo $CREDS | jq -r '.[1]')

    - export AWS_SESSION_TOKEN=$(echo $CREDS | jq -r '.[2]')


Step 5: Is to add required permissions to your AWS role, this is something that I am still working through, but the entitlements required are extensive.

Is this better than storing AWS Keys in GitLab?  Not if you are just using aud that can be overridden in your pipeline.  I would recommend restricting your role further to only the protected deployment branch and make sure that only people who you want to empower to make use of the powerful AWS deployment role are allowed to merge into the deployment branch.  Then there is also a question of security of the gitlab server and user access to gitlab servers, but that feels like a risk on the level of securing access to AWS.  At that level you should be concerned about protecting sensitive data with encryption and perhaps less concerned about costs of people spinning up unexpected AWS resources on your tab or even bringing down your application that you should be able to recover from backups.

















Thursday, November 14, 2024

Building on AWS



Building on AWS

M 917 536 3378
maksim_kozyarchuk@yahoo.com




When I started developing the Kupala-Nich platform, I knew I wanted to leverage AWS’s serverless environment. New to serverless technology, I spent considerable time refining the platform’s architecture, learning the limitations of certain tools, and discovering AWS components along the way. Here, I’ll share the current architecture of the platform and invite feedback on whether best practices or AWS stack components could further enhance its robustness, security, and efficiency. Below is a high-level diagram of the Kupala-Nich application architecture followed by a brief description of the components.


Platform Architecture Overview The frontend is a React application that maintains a WebSocket connection to the API Gateway, handling most server interactions. The backend consists of several Lambda functions, DynamoDB, and S3 storage.
  • WS Lambda: A lightweight Lambda function that exposes various endpoints for the UI. Built in Python without dependencies beyond the standard library, on average it responds in under 50ms and requires less than 100MB of RAM.
  • PyCaret Lambda: Deployed via Docker, this function packages PyCaret for ML analysis, taking 3-5 minutes to execute with ~500MB of RAM. Training datasets and generated analysis are stored in S3.
  • CalcPosition Lambda: Triggered by DynamoDB Streams upon updates to the Position and Market Data tables, it calculates positions and P&L values, updates the CalcPosition table, and publishes results to WebSocket clients. Although light, it will scale as position complexity increases.
  • EOD & YFinance Lambdas: These are event-triggered by timers. YFinance refreshes market data, and EOD snapshots positions to the EODPosition table and performs maintenance, such as rebalancing and closed position aggregation. The YFinance Lambda requires pandas and yfinance libraries, so it’s deployed as a zip package.


Code Repositories & CI/CD Pipeline
The platform’s complexity lies not only in its architecture but in the automation of CI/CD pipelines and management of permissions. Here’s an overview of the code structure and CI/CD practices:
  • Frontend Repo: This contains all React code. The CI/CD pipeline is straightforward, running npm install, tests, npm build, and finally deploying the build to the Apache server. The pipeline, built on Node, completes in about two minutes, with most time in npm install and build steps. Unit test coverage is moderate, focusing on formatting logic and component stability across changes.
  • Backend Repo: This includes all backend code, except for PyCaret-specific functions, which are in a separate repository. Each Lambda has its own Python package with shared components in a common package. Tests coverage is extensive, most integration-level scenarios are using Moto for AWS mocking. Lightweight Lambdas share a package with different entry points based on the trigger type. YFinance Lambda is packaged separately due to dependencies. The repo also includes a CloudFormation template that defines the application’s tables, Lambdas, API Gateway configurations, and security roles. CI/CD here includes a test stage for type checking and validation and a deployment stage for package and SAM deployments.
  • PyCaret Repo: This houses code for PyCaret-based analysis and data retrieval. The Lambda can be triggered by WebSocket API Gateway or AWS Step Function events. Test coverage is minimal as the focus is on PyCaret invocations. Docker packaging takes about 7 minutes due to PyCaret’s dependencies. To expedite testing, different base images are used for test and package steps. This repo also includes two CloudFormation templates, one for the Lambda and another for the AWS Step Function definition.


Next Steps: Scalability & Security
As the Kupala-Nich platform evolves from a demo to a production product, scaling and security will be crucial focuses. I’ll cover these topics in detail in a future article. If you would like access to the code repo, please reach out to me by email.