Secure Engineering
A solid understanding of secure engineering is essential to delivering reliable services. Of all the aspects of engineering, security is in some ways the hardest. Secure engineering involves a wide array of techniques including ways of working and deeply technical aspects — and a lapse in any single area can have serious consequences.
This article covers secure engineering practices, tools and techniques, including how to continually monitor and improve security. Despite the level of detail covered here, this is not an exhaustive list, but the areas covered do give a good starting point for what to think about. We have focused on AWS and GitHub when making recommendations, but most of our recommendations will apply to other services too.
Among the sources used in compiling this article, two stood out:
- We love the NCSC — Secure development and deployment guidance documents when thinking about and implementing secure engineering.
- AWS provides excellent security guidance as part of the AWS Well Architected — Security Pillar (PDF).
Note: Please do check any cost implications and conditions of use associated with any tools before enabling or configuring them.
Essential reading
OWASP Top 10
The OWASP Top 10 is the standard reference for building secure web applications. It gives clear descriptions of potential vulnerabilities and how to guard against them, and is essential reading for all software engineers. This article outlines secure engineering practices but does not duplicate the detail from the OWASP Top 10, knowledge of which is an assumed prerequisite.
Regulations
Working knowledge of applicable regulatory requirements is also assumed, in particular:
- The General Data Protection Regulation (GDPR), which governs the handling of personal data.
- Privacy and Electronic Communications Regulations (PECR) which places restrictions on the use of cookies and similar technologies.
Legal regulations which govern the delivery of software services have strengthened significantly in recent years and it is essential to understand how they affect the way you engineer applications.
Testing
As with writing good code, doing security well involves continual testing — in many cases using the tests to steer implementation.
The OWASP Web Security Testing Guide is an extensive and wide-reaching reference on how to test for security, including examining the software delivery process and reviewing code, as well as more traditional black-box penetration testing. It is a large resource, but is worth investing some time in for the security-conscious.
System design
Data minimisation
Consider how you can reduce the security burden of your system by reducing the data being handled to the bare minimum needed to do the job. Remember that data minimisation is one of the GDPR principles for personal data, and is good practice for any potentially sensitive data.
Reduce attack surface
Design the system to reduce its attack surface, including favouring managed services by default.
Web applications
- Ensure the basics from the OWASP Top 10 are covered, including guarding against injection and cross-site scripting attacks.
- Session tokens should be stored securely with
Secure
,HttpOnly
andSameSite
flags set. - Sessions should be refreshed on privilege escalation to avoid session hijacking/fixation.
- It is generally best to outsource identity management and authentication, but if you are implementing it yourself then:
- Passwords should be hashed with a salt.
- Timing authentication attacks should be considered, potentially using a WAF to mitigate.
- Consider responding with 404 or blanket 403 when the user is not authenticated to avoid leaking information as to whether resources exist.
- Prevent Clickjacking with
X-Frame-Options
. - Design the domain structure to prevent cookies leaking from production environments to non-production environments. For example, avoid non-production environments being on a subdomain of the production domain.
- Ensure information is not being leaked in the likes of error messages, stack traces, or headers.
Secure Software Supply Chain
A secure "software supply chain" is vital to a secure production system. Code is the core of everything that runs within an environment, and it is essential that we understand what software we are using, including ensuring our proprietary code is designed with security in mind.
The software supply chain includes:
- Who contributed the code, and when they contributed.
- How the code was reviewed for security issues.
- Whether the code has any known vulnerabilities.
- Are up-to-date versions of all dependencies available and being used?
- Any license information.
- Everything that interacts with the code, including build tools, pipelines, and deployment automation.
Proprietary Code
Proprietary code is the term used for any code that is written within or specifically for your organisation. In order to maintain a secure supply chain we must ensure that all code is:
- verified as coming from a trusted source.
- reviewed before it makes it into a privileged environment.
The recommendations below largely assume the use of GitHub, but may also apply to other hosted repositories.
- Only use private repositories unless your organisation has decided that specific code should be open source.
- Set organisation settings so members can only create private repositories to prevent accidentally exposing code publicly.
- Prevent users from transferring repositories, changing their visibility, or forking a repository.
- Prevent members from deleting repositories to avoid unwanted loss of code.
- Enable and enforce GPG signing of commits to ensure authors' identities can be verified.
- Enable and mandate SSO login within your organisation, if practical.
- Require multi-factor authentication enforced at the Organisation level to reduce the risk of unauthorised access.
- Protect all main branches:
- Do not allow history to be rewritten with force pushes, in order to prevent loss of information.
- Enforce at least one code review for a merge and prevent admin override for merging to ensure the quality process is followed.
- Only allow a merge when build and tests are successful to ensure only validated code can make its way into a deployable artefact.
- Use a secret manager where necessary — never store secrets inside a repository in code.
- Enable GitHub Secret Search to find secrets that have been committed and remove them (see Removing sensitive data from a repository).
- Consider scanning proprietary code using static code analysis tools like GitLab, ESLint or SonarQube.
- OWASP says, "Source code analysis tools, also referred to as Static Application Security Testing (SAST) Tools, are designed to analyze source code or compiled versions of code to help find security flaws."
- Be aware of the limitations of these tools. There are many classes of vulnerabilities against which they are not effective, and they have a relatively high false positive rate.
Build dependencies
Most software projects involve a significant number of dependencies, including open-source code libraries written by the community, and potentially container base images. Any given version of a dependency can contain vulnerabilities which may have been fixed in later versions, so it is important to stay up to date.
- Reference a specific version for every dependency rather than 'latest' or similar to avoid unexpected changes. Where possible, use a hash to reference the specific version rather than a simple version identifier to ensure the dependency version being targeted cannot be changed without your knowledge.
- Consider storing dependency artefacts in a local repository such as Artifactory or Nexus rather than downloading directly from the internet on every build. This reduces the risk of tampering or unexpected changes.
- Scan direct and 'transitive' dependencies (dependencies of dependencies) for known CVEs (Common Vulnerabilities and Exposures) on every build or on a schedule (e.g. nightly) for code not under active development.
- Consider using GitHub's Dependabot to automatically upgrade dependencies, with the option to limit to security fixes, if preferred.
- If using containers, scan images for vulnerabilities, using the likes of Clair, Snyk, Docker Hub, or AWS's built in vulnerability scanners.
- Any code or container CVE with Common Vulnerability Scoring System (CVSS) severity of high or critical should fail a build if run as part of a pipeline, or alert support engineers if part of a scheduled scan.
Scan running software
Consider automated penetration testing in your build pipeline, e.g. using OWASP ZAP.
- Automated penetration testing tools work by running the built software in a sandbox environment (either within the CI/CD system or in a cloud platform) and probing its HTTP endpoints for potential vulnerabilities.
- As with other automated checks, this should fail a pipeline build if vulnerabilities are found.
Deployment
This section covers the steps from a build completing and being successfully validated, up to when the software is running in an environment.
- Deployments should be entirely automated. Humans should not be permitted to deploy to any environment except during an emergency when they have been granted "break-glass" time-limited elevated permissions.
- A single build artefact should be "promoted" between environments so the same code is running everywhere for a given version; there must be no environment-specific builds. This helps to ensure that testing done in one environment is valid in other environments, and removes the need for tests to be re-run in every environment.
- Access to CI/CD tooling must be restricted to authorised people, and authentication should be robust.
- CI/CD tooling must have restricted permissions — just enough to deploy to each environment, and no more. In AWS it is preferred to rely solely on ambient IAM permissions or, where that is not possible, to use STS to request temporary credentials for the duration of the deployment.
- Code artefacts should be signed and signatures verified before deployment and at runtime.
Infrastructure and runtime dependencies
- Where applicable, use tools like Qualys Vulnerability Management or Amazon Inspector to scan virtual machines for vulnerabilities.
- If using AWS Elastic Beanstalk, use managed platform updates to keep EC2 instances and minor/patch versions up to date weekly.
- Cloud virtual machine images that are used should be from trusted providers or custom-built private encrypted images. Third-party images should largely be avoided. If a third-party image must be used, it should be copied and then reviewed, with the copy then used in lieu of the public version.
Quality and security gates
We have discussed the quality gates required as part of a secure software supply chain. These validate proprietary code, and external dependency code and software, before artefacts can be promoted to any environment.
These gates should be introduced gradually whilst monitoring false positives to avoid potential erosion of trust of security practices in delivery teams. The DSOMM (DevSecOps Maturity Model) can help with measuring current maturity, and GitHub has provided a PDF for how to implement Level 1 maturity.
Any continuous integration and deployment (CI/CD) pipeline within a project should contain the following things where applicable to that application, and all should pass satisfactorily before deployment is permitted:
- A full build of the application including tests such as unit, integration, component, end-to-end, and contract.
- Linter for code style checks (e.g. ESLint).
- Validation of no critical or high severity vulnerabilities in dependencies (see Build dependencies).
- Validation of no critical or high severity vulnerabilities in the build artefact (see Scan running software).
In addition to the above, consider using Github Advanced Security, which enables semantic code analysis tool CodeQL to scan for security vulnerabilities and potential usage of untrusted data sinks.
Here is an example of a CI/CD pipeline that uses all of the above recommendations:
{style="padding: 20px; border: solid 1px #eee; max-width: 95%"}
This is a description of what is happening at every stage:
- A human merges a pull request in GitHub once a code review is complete. This merge step should not be enabled until a build has successfully passed. This build should include steps 2 and 3 as described in the diagram.
- The code is built, along with all dependencies required to produce a working artefact.
- The code is tested. The various test steps are parallelised where possible, in order to increase the speed of the pipeline and provide engineers with rapid feedback.
- Test — this step comprises all of the functional and non-functional testing (unit, integration, component, contract, end-to-end, and performance testing).
- Linting — any rules around code styling can be applied in this step.
- SAST code scan — a Static Application Security Test, conducted by CodeQL, to check for vulnerabilities such as SQL injection or cross-site scripting.
- Dependency vulnerability check — a check to ensure that all included dependencies are not vulnerable to a known critical or high CVE.
- Artefact vulnerability check — a check to ensure the built artefact doesn't contain any known critical or high CVE (usually only valid in Docker images, AMIs or similar).
- The artefact is released and stored. We want to version this artefact, and promote it between each environment now it has passed all checks and is trusted for deployment.
- The artefact is immediately deployed to the test environment for manual validation, if required.
- The deployment is smoke tested. The new deployment is automatically tested with a battery of quick and simple smoke tests. This verifies important user journeys to ensure the application is working as expected.
- The artefact is deployed to the pre-production environment, as above.
- The deployment is smoke tested in the pre-production environment, as above.
- Manual approval is given for a deployment to the production environment. This manual step can be removed once there is a high level of confidence and a low rate of false positives in the testing workflow. Once this step is removed, the pipeline will truly be a continuous deployment pipeline.
- The artefact is deployed to the production environment, as above.
- The deployment is smoke tested in the production environment, as above.
Cloud infrastructure
This section focuses largely on AWS, but some of the recommendations may be applicable to other cloud environments or hosting methods.
AWS provides a number of very useful resources including the AWS Well-Architected Security Pillar, and the Security Architecture Whitepaper. We highlight below some of the key recommendations, whilst also outlining the relevant key principles.
It is generally recommended that all audit and security logs go to a separate security or logging account, and that all centralised AWS security tools are also operated from this security account where possible. See this AWS white paper on the recommended AWS account patterns.
Implement a strong identity foundation
Implement the principle of least privilege and enforce separation of duties with appropriate authorisation for each interaction with your AWS resources. Centralise identity management and aim to eliminate reliance on long-term static credentials.
- Use SSO for AWS using the multi-account identity account structure pattern to control access to all AWS accounts.
- Use SCP (Service Control Policies) within the AWS Organisational root account to establish clear IAM guardrails for accounts (e.g. DENY for
cloudtrail:stoplogging
, restrict access to only one specific AWS region). - Ensure all AWS root accounts have MFA enabled and have passwords set.
- Reduce the use of static credentials such as IAM access keys for any access. Human access should be done via AWS STS to obtain automatically expiring credentials.
- Human and machine identities should not be shared. Create separate service users or roles in IAM where necessary.
- Enable and configure AWS Config to monitor for long-standing static credentials (IAM access keys and passwords) and notify support engineers that they require rotation.
- Store secrets securely. AWS Secrets Manager provide a way to store and retrieve secrets for applications. Alternatively, Lambda environment variables can be encrypted using an AWS KMS Customer Managed Key.
- Role and policy documents should always operate in a principle of least privilege.
- Use the AWS IAM Access Analyzer to reduce the scope of IAM role and policies where particular services are not being used by certain roles.
- While operating in a principle of least privilege, things can go wrong and more access may be needed by users to resolve incidents or issues. Implement an automatic break-glass or emergency access mechanism. This may be implemented by allowing users to add themselves to specific groups in IAM to escalate privileges — this should be monitored and alerted upon using AWS CloudTrail.
Enable traceability
Monitor, alert, and audit actions and changes to environments in real-time. Integrate log and metric collection with systems to automatically investigate and take action.
- Enable AWS CloudTrail; it is recommended to enable an organisational trail to mandate AWS API access logging on every AWS account in the organisation, following the multi-account logging account structure pattern. Use a Service Control Policy to disable all non-essential write access to the logging account. Use S3 Object Lock to prevent logs from being modified.
- Enable VPC Flow Logs. These can generate significant volumes of logs so it may be desirable to only record blocked connections, depending on your ingestion approach.
- Enable AWS Config to monitor and record all AWS resource configuration changes.
- Enable Amazon GuardDuty to automatically detect threats and unauthorized behaviour.
- Use AWS Security Hub, which provides a single view of all security findings from across most AWS services.
Apply security at all layers
Apply a defence in depth approach with multiple security controls. Apply to all layers, including network edge, VPC, load balancing, every instance and compute service, operating system, application, and code.
- S3 buckets should never be public, even to host a website — CloudFront should be used to expose the S3 bucket instead.
- Public and private subnets should be used to draw clear distinctions between publicly-accessible and non-public services. RDS, EC2 instances and other services should largely be private, with external access provided by AWS managed services such as API Gateway or ALB.
- Different environments must not be permitted to communicate with each other. For example, traffic from production must not pass to the test environment, and vice versa.
- In general SSH access should not be used for EC2 instances. Instead use Amazon Systems Manager Session Manager for secure, auditable access to EC2 instances.
- Enable and configure Amazon Inspector in each account to periodically scan all EC2 instances for vulnerabilities. This should notify teams directly based on tagging for triage and resolution. Recommended initial rulesets are Network Reachability, Common vulnerabilities and exposures and Security best practices for Amazon Inspector .
- Enable AWS WAF on all Amazon API Gateways, Amazon CloudFront, and Application Load Balancers as applicable. Use of AWS Firewall Manager is recommended at the root account level to simplify management of rules that can be applied to all AWS WAFs to quickly to protect resources.
- Consider AWS WAF security automations to block known threat actors.
- If using containers, use Amazon ECR with ECR Image Scanning as part of the build pipeline.
Protect data in transit and at rest
Classify your data into sensitivity levels and use mechanisms such as encryption, tokenisation and access control where appropriate.
- Use AWS KMS Customer Managed Keys where possible, enabling encryption at rest on S3, RDS and EC2 EBS.
- Audit and review the use of these keys using CloudTrail log analysis.
- Use AWS Config rules to ensure all EBS volumes are encrypted. This can be enabled at the organisation level.
- Enable Amazon Macie to check for public S3 buckets or files and S3 buckets without encryption enabled. This can be enabled at the organisation level.
- Use HTTPS encryption for all communication, using Amazon Certificate Manager to manage certificates, which automatically handles key storage, rotation and renewal.
- Consider restricting to TLS 1.2 or later only
- Consider enabling only Perfect Forward Secrecy cipher suites (e.g. ECDHE)
- Use VPC endpoints when transmitting sensitive data to/from specific services, such as S3 from within a VPC, to prevent it travelling over the public internet.
People factors
Managing access
- Ensure the joiners and leavers process covers managing access to all systems.
- Ensure password managers are used with MFA enabled.
Keep people away from data
Use tools to reduce or eliminate the need for direct access or manual processing of data. This reduces the risk of mishandling or modification and human error when handling sensitive data.
- Review VPC Flow Logs.
- Use S3 Access Analyzer to assess who has access to which data.
- Use and configure IAM Access Analyser to audit access and reduce where necessary.
Prepare for security events
Prepare for potential incidents by maintaining incident management and investigation processes that align to your organisational requirements. Run incident response simulations and use tools with automation to increase your speed for detection, investigation and recovery.
- Review NIST SP 800-61 Computer Security Incident Handling Guide.
- Review AWS Well-Architected: Security: SEC 10.
- Ensure incident responders have access to AWS and systems to investigate and recover effectively.
- Ensure detailed logs are available and that these are segregated from application accounts.