This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Manage

When the solution is designed, developed, and deployed, another job begins that may be a bit unfamiliar to many: Management. Regardless of how much development is happening, we still have a responsibility to manage what we roll out into production (or to other environments).

These need to be monitored, we must ensure we have regular backups _that also need to be tested_, that we have up-to-date disaster recovery plans, follow up on vulnerable dependencies, and much more.

1 - Verify the Design

When developing a solution, we should always validate that the solution adheres to the design. If it deviates, we must either correct the solution or update the design.

When we create a design for a new solution, there may be details we do not know, or unexpected complications may arise during implementation. This can result in the original design deviating from the final solution.

Documentation is crucial for understanding how a solution is set up and how it works, especially if an incident occurs that requires redeployment or disaster recovery. To ensure that the gap between documentation and the final product is not too large, we should always validate the design afterward.

What Should We Check?

One of the most important aspects is everything around the code that may not necessarily be in code form. This includes the resources we use, network setup, and firewall openings. We should also review IAM and the permissions granted to resources and applications. If there are elements in the design that are not implemented, these should be removed. If we have implemented elements not in the design, the design should either be updated, or the elements should be removed from the solution.

How Can We Check?

This depends greatly on the form and nature of the project, but in many cases, the IT organization at the customer (for projects hosted at the customer) can help. If the solution runs at Bouvet, Intern IT & Security can certainly assist with checking things like network configurations or point you in the right direction. There is a lot you can do yourself as well, but check with Intern IT & Security before installing tools and running scans or similar actions.

More Information

2 - Audit or Review of Project or Delivery

Regardless of our own controls, we sometimes find ourselves in situations where the customer or recipient wants to review the quality and procedures of what is being delivered. Security and quality in a solution require different measures than the functional aspects, which are typically easier to verify against customer requirements.

Not all deliveries are subject to audits or reviews from the customer’s side. This typically applies to deliveries where Bouvet has taken the lead, and the customer receives a solution without being heavily involved in the project’s operations. An audit is a tool the customer can use to ensure that Bouvet has done the job as agreed, giving the customer the opportunity to review our routines and work processes to ensure we have delivered as expected.

Such an audit will be agreed upon in the contract we are working against, but not all customers will take advantage of this opportunity. In many cases, an audit is most relevant when the project is delivered or at a point after delivery when it is in operation. The goal will then be to verify that the project’s requirements are met and that it is managed and operated in a way that aligns with what the customer expects and demands.

What Is Required of Us?

This will vary from contract to contract, but overall, we should ensure that the project has the necessary documentation required and expected so that we can demonstrate that the security of the delivery is maintained. We should always deliver a minimum level of security, but if the customer has chosen not to heed recommendations or declined additional services, we must ensure that this is documented properly.

More Information

3 - Logging and Monitoring

When a solution is in operation, logging is one of the most important tools we have. Collecting information is critical to gaining insight into what is happening with the solution and responding to events, but only if we monitor it.

Regardless of where a solution is deployed, we should ensure that it is monitored. Even if it is only available on the intranet with only internal users working from approved devices over VPN, logging information is important if one of these is compromised. A typical DevOps team will collect some information to help debug the application’s functionality, but we also need other information to assess the security context around it.

Remember

Regardless of the need, remember that privacy applies to logs as well! Do not collect more information than you need, and logs must be deletable after a given period.

The goal of logging has three primary purposes:

  • Intrusion detection - We must be able to detect if someone is attacking the system
  • Investigation basis - We must have enough information to understand what happened, how it happened, and who did it
  • Satisfy customer or external requirements, such as from authorities

What Should We Log?

What we log will vary greatly depending on who the customer is, the risk and threat landscape they operate in, and their needs for log information. In some cases, the customer will have its own security organization, typically a Security Operations Center (SOC), responsible for monitoring networks and applications. They will then have requirements for what and how to log, but if this does not exist, we must define our own requirements to have a starting point.

Below are some points that should be an absolute minimum, but the team must understand what is logged, why it is logged, and how this information relates to other requirements such as privacy.

Authentications and Failed Authentication Attempts

If someone logs into the solution, this should be logged. This is especially important if it occurs from a place a user does not normally log in from, or if it happens with a different browser or client than usually seen. Failed logins should also be logged so that it is possible to act on them.

Errors during JWT validation or other session-related errors should also be logged so that they can be reviewed afterward.

Unauthorized Access Attempts and Access Changes

Events where users try to access functionality they are not normally authorized for are important signals that must be captured. This could be as simple as a user getting or testing a URL from a colleague, but it could also be an attacker trying to map or test an application. Regardless of the cause, it is important information that must be preserved - if an incident occurs later, it is important to be able to say something about movement patterns and the like leading up to it.

If the application supports elevating or changing permissions, these are also typical events that need to be logged. Elevation is a mechanism where a user is given additional permissions, but these must be “turned on” before they are available - often with an extra level of authentication such as MFA or similar. Examples of such mechanisms are sudo in Linux or Privileged Identity Management (PIM) in Azure. When these are activated, it is important that the logs reflect this since errors or weaknesses in these solutions would be critical for the application’s security.

Application Errors, Network Errors, and Similar

If errors occur in the application, these should also be logged. We should never give the user more information than absolutely necessary, but the details should be included in the logs so that they can be monitored or reviewed later.

If the application relates to the network, for example, by monitoring network connections, connections to other resources, or similar, disruptions or outages here should also be logged as they may be important indicators.

Logging Unexpected Inputs

All applications have inputs that can be described, even free text inputs where the user can enter anything. Inputs that violate validation rules or instances where a user attempts to change information that should not normally be changeable are typical cases that need to be logged.

If the application supports file uploads or similar, deviations from expected files, such as discrepancies between file type and file signature or unusually large or small files, should be logged.

How Do We Log?

How we log will also vary from project to project, the platform we run on, and the resources we are allowed to use. An important point to keep in mind when designing the logging solution is that logs are a target for attacks! An attacker who can exploit vulnerabilities and then manipulate the logs can both hide activity and plant false evidence.

All logs we have should be stored in a place where data can be added but not changed afterward. The advantage of using such solutions is that you can collect logs from many different sources, such as cloud resources, network components, and applications, in one place. This can give you insights from multiple dimensions when reviewing an incident, which can be useful in understanding the overall situation.

Timestamps and Log Format

Being able to determine the sequence of events is incredibly important. We must therefore understand what the different log sources use as the basis for synchronizing clocks internally to be sure that an event on node A is related to another event two seconds later on node B.

It is also important to standardize log formats where possible. Much logging centers around the log message itself, which is typically text-based, but all metadata should be standardized where possible. Define what you need to see and ensure this is available from the various sources.

More Information

4 - Dependency Management

The status of the dependencies we have will change over time, and it is inevitable that vulnerabilities will be discovered that we must mitigate. This job can be as simple as updating to a new version, but may also require more significant changes to the application.

When the team is in maintenance mode, most of the issues mentioned in the article on Software Supply Chain still apply. You will encounter situations where:

  • A critical vulnerability is discovered in a package you use
  • Packages are deprecated and replaced with something new that is not directly compatible with the old
  • Developers behind packages stop maintaining them
  • Malicious actors take over a package and use it to spread malware

….and certainly other scenarios that result in you needing to do something. To ensure that packages hitting one or more of the points above are addressed, tools like Sonatype and others offer the ability to monitor various stages of the lifecycle, with the option to alert you when vulnerabilities or other events affecting quality occur.

More Information

5 - Preparedness

An untested backup is worthless, and the same applies to all disaster recovery plans unless they are tested. The team must verify backups and plans regularly so that everyone knows what needs to happen.

If the team has done everything right so far, you have a disaster recovery plan that tells you what needs to be done to restore infrastructure, applications, and data to return to normal operations.

The reasons for needing to restore can be many and vary greatly in scope. Who hasn’t run a delete from <table> where x = 'something' with missing or incorrect parameters, or dropped the wrong table from a database? Or deleted a server or app service from a prod environment by mistake (I was just trying to fix something quickly…). In such cases, recovery can be quick if you know what went wrong, but in other and more complex cases, such as involving unknown software errors or problems with a cloud service provider, it can be more complicated.

For the plans to have real value, they must be tested in practice. In a perfect world, the plans should be so detailed that recovery is possible even if the entire team gets hit by the bus or otherwise becomes unavailable. In practice, this is often difficult to achieve, but the team should aim to create a good recipe for how a recovery can occur under given conditions and then test this regularly in an alternative environment.

An example recipe for the solution outlined in the article on system diagrams could be as follows. The premise of the plan below is that we have source code and pipelines available in, for example, Azure DevOps, but the application and resources have mysteriously disappeared from Azure:

  1. Check that new subscriptions are in place in Azure
    • Configure Azure Pipelines to deploy to these
    • Verify that all Entra groups are available
  2. Deploy infrastructure as code
  3. Configure NSGs and firewalls (if not done as code)
    • Turn off access outside the delivery team to avoid user interference with the restore process
  4. Verify that resources have access to the data platform
  5. Verify access to the database
  6. Restore application and data:
    1. Restore data to the database from the latest backup
    2. Deploy backend
    3. Deploy frontend
  7. Verify that the application works
  8. Publish PowerBI report
    • Verify that it can read data from the backend
  9. Turn on access for end-users so they can use the application again

It is worth mentioning that each of the points may need additional information, with references to access packages or group memberships for the person restoring to gain the necessary access.

More Information

6 - Contingency Plans and Incident Management

When an incident occurs, it is important to be prepared to avoid wasting valuable time on activities that should have been ready in advance. Who should be notified, who is responsible, and who can help?

Many people think of security incidents as targeted attacks where someone attacks a solution by hacking it. In some cases, this may be correct, but an incident can be much more.

NSM defines a security incident as “A deviation situation where there is a potential for loss of confidentiality, integrity, and/or availability of information or ICT services. A security incident can occur as a result of a data attack, technical failure, or unintentional errors.” In other words, an incident can be almost anything that affects confidentiality, integrity, and availability, and depending on the context, different customers will have different requirements for when we need to report and/or act on this.

Preparations

This is covered in several articles under “Plan,” but one of the most important things you can do is document the requirements we must comply with and our responsibilities within the different phases, in addition to contact points with the customer. Some customers are very security-focused and will monitor and alert the delivery team on their own, while others rely on the teams to monitor themselves.

NSM lists several useful points that should also be considered within the team; many of these point to the organization as a whole, but it can be important for the team to be aware of the different measures.

When an Incident Occurs

Incidents can take many forms. An incident can be weaknesses or vulnerabilities discovered in an application, dependencies, or the runtime environment, but it can also be attacks - both obvious and more covert.

If you discover or have reason to believe that a solution is under attack, this must be reported to the customer immediately. It is not always the case that the attacked solution is the target; in many cases, a solution is just a stepping stone to another. Therefore, it is also important to know what accesses and network openings it has to other solutions, so the customer’s IT organization can check these for signs of attacks.

If you come across signs that a solution has been attacked or used for an attack, it is also important to notify the customer so they can secure information and evidence for further investigation.

Keep in mind

Handling and investigating incidents is a specialized field. If you come across signs that something may have happened, inform your contact point and wait for instructions from them before taking any action.

More Information