Making the cloud a success
Cloud-Native as the Future of Data Loss Prevention
Cloud-Native as the Future of Data Loss Prevention 150 150 CloudGovCo

Data loss prevention (DLP) is one of the most important tools that enterprises have to protect themselves from modern security threats like data exfiltration, data leakage, and other types of sensitive data and secrets exposure. Many organizations seem to understand this, with the DLP market expected to grow worldwide in the coming years. However, not all approaches to DLP are created equal. DLP solutions can vary in the scope of remediation options they provide as well as the security layers that they apply to. Traditionally, data loss prevention has been an on-premise or endpoint solution meant to enforce policies on devices connected over specific networks. As cloud adoption accelerates, though, the utility of these traditional approaches to DLP will substantially decrease.

Established data loss prevention solution providers have attempted to address these gaps with developments like endpoint DLP and cloud access security brokers (CASBs) which provide security teams with visibility of devices and programs running outside of their walls or sanctioned environments. While both solutions minimize security blind spots, at least relative to network layer and on-prem solutions, they can result in inconsistent enforcement. Endpoint DLPs, for example, do not provide visibility at the application layer, meaning that policy enforcement is limited to managing what programs and data are installed on a device. CASBs can be somewhat more sophisticated in determining what cloud applications are permissible on a device or network, but may still face similar shortfalls surrounding behavior and data within cloud applications.

Cloud adoption was expected to grow nearly 17% between 2019 and 2020; however, as more enterprises embrace cloud-first strategies for workforce management and business continuity during the COVID-19 pandemic, we’re likely to see even more aggressive cloud adoption. With more data in the cloud, the need for policy remediation and data visibility at the application layer will only increase and organizations will begin to seek cloud-native approaches to cloud security.

What is cloud-native data loss prevention?

The explosion of cloud technologies in the past decade has brought new architectural models for applications and computing systems. The concept of a cloud-native architecture, while not new, is a development that’s taken off in the last five years. But what exactly does cloud-native mean, and how can it apply to security products like data loss prevention (DLP)?

Cloud-native describes a growing class of platforms that are built in the cloud, for the cloud. True cloud-native data loss prevention is defined by the following features:

  • Agentless. Cloud-native DLP solutions aren’t deployed as software programs that require installation, rather they integrate with the applications they secure through APIs. This makes deployment easy and updates to such platforms effortless, without getting end-users or IT involved. 
  • API driven. Central to cloud-native data loss prevention is the API driven nature of such solutions. Connecting with cloud platforms via API means that visibility and security policies immediately apply at the application layer. API-driven solutions can derive platform-specific context & metadata, as well as provide granular, platform-specific actions, versus broad-brush blocking on the network.
  • Agnostic. True cloud-native solutions are platform, endpoint, and network agnostic in that they’re capable of integrating with cloud platforms quickly and can provide single pane of glass visibility across the cloud.
  • Automated. True cloud-native solutions don’t just provide visibility into the cloud, but help automate policies whenever possible. The sheer volume of data that moves through cloud systems combined with the always-on nature of cloud applications means that incidents can happen at any time and will require immediate remediation. Automation ensures that security teams can respond to these as quickly as possible.
  • Accurate. Finally, in order to help security teams process the massive amounts of data in the cloud, cloud-native DLP must be accurate. The accuracy of such platforms is often enabled by the same systems that make them automated — an effective use of machine learning that can quickly and accurately identify when business-critical data has been exposed.

What are the advantages of cloud-native DLP?

When you consider the capabilities listed above, cloud-native DLP is designed to help organizations get a handle on protecting the massive volumes of data moving in and out of data silos daily. With organizations understanding that the security of their data in the cloud is their responsibility, security teams are increasingly investing in tools designed to help them address visibility and policy blindspots. While it might be the case that cloud-native data loss prevention platforms aren’t the only security tools companies choose to invest in, it’s clear that they’ll be one of the most essential parts of their security toolkit.

This post originally appeared on Nightfall and is reproduced here with permission.

About Nightfall

Nightfall is the industry’s first cloud-native DLP platform that discovers, classifies, and protects data via machine learning. Nightfall is designed to work with popular SaaS applications like Slack and GitHub as well as IaaS platforms like AWS. You can schedule a demo with us below to see the Nightfall platform in action.

Kill That PowerPoint
Kill That PowerPoint 150 150 CloudGovCo

Instead of a multi-slide presentation on your next #cloud project, use a briefing note with six simple paragraphs:

  1. The Challenge. This defines “where we are now” and is always either a problem or an opportunity. Don’t be afraid to state problems, it’s not ideal to hide everything under the MBA-speak “opportunity”, so differentiate between problems and true opportunities.
  2. The Undesired Outcome. This defines “where we don’t want to be”–what will happen if the problem or opportunity is not addressed. You can also think of this as the opportunity cost if we spend the money on something else.
  3. The Desired Outcome. This defines “where we do want to be,” which should obviously be better than the undesired outcome or the status quo.
  4. The Proposed Solution. This defines what must be done to avoid the undesired outcome and achieve the desired one.
  5. The Risk Remover. This explains in simple terms why the proposed solution is likely to succeed and unlikely to fail. (This is not a risk impact analysis.)
  6. The Call to Action. This tells the reader the specific decision you want made that will put the solution into motion to achieve the desired outcome.
5 Common Accidental Sources of Data Leaks
5 Common Accidental Sources of Data Leaks 150 150 CloudGovCo

In cybersecurity and infosec, it’s common to assume that criminals are behind all data breaches and major security events. Bad actors are easy to blame for information leaks or account takeovers, because they’re the ones taking advantage of vulnerabilities in systems to worm their way in and cause massive damage. But how do they gain access in the first place? Most of the time, well-meaning everyday people are the real source of data insecurity.  

A study of data from 2016 and 2017 indicated that 92% of security data incidents and 84% of confirmed data breaches were unintentional or inadvertent. Accidental data loss continues to plague IT teams, especially as more organizations are rapidly moving to the cloud. While it’s important to prioritize action against outside threats, make sure to include a strategy to minimize the damage from accidental breaches as well.   

This list of five common sources of accidental data leaks will help you identify the problems that could be lurking in your systems, apps, and platforms. Use these examples to prepare tighter security controls and keep internal problems from becoming major issues across your entire organization.      

#1: Exposing secrets in code repositories like GitHub

In January 2020, a security researcher found Canadian telecom company Rogers Communications had exposed passwords, private keys, and source code in two public accounts on GitHub. As the investigation into the Rogers breach went on, the researcher found five more public folders on GitHub containing Rogers customer data, including personally identifiable information (PII) like phone numbers.  

This kind of thing happens all the time, like in the case of German automaker Daimler leaking Mercedes-Benz’s source code for smart car components through an unsecured GitLab server in May and Scotiabank exposing source code and private login keys to backend systems in GitHub in September 2019.  

Businesses looking for a secrets detection solution for GitHub should consider Nightfall Radar for GitHub. It’s a fast and easy way to prevent data loss in the platform and avoid problems like exposing sensitive data in code repos, with automated scanning and customizable alerts and reporting to help you take control of your company’s data.      

#2: Leaking data from misconfigured buckets in AWS S3

Like GitHub, AWS S3 can be a source of accidental data insecurity. All it takes is one improperly configured bucket in the cloud server to expose huge amounts of data. AWS S3 is different from GitHub in one big way here: GitHub repos allow users to set sharing permissions right away, with “public” set as the default choice. In today’s usage, AWS buckets are private by default. This means user error is behind most major AWS data leaks, when data is exposed in these public buckets.  

Outpost 24 cloud security director Sergio Lourerio, spoke to Computer Weekly in a January 2020 interview on the rising danger of data leakage through public AWS S3 buckets. He pointed to the nature of us all working in the early days of cloud infrastructure security allowing for the prevalence of opportunistic attacks on publicly accessible AWS S3 data buckets.   

“You’d be amazed to see the data you can find there just by scanning low-hanging data in cloud infrastructures,” Lourerio said. “And it only takes a couple of API calls to do it. With a lot of data being migrated to the cloud for use cases like data mining, and lack of knowledge of security best practices on [Microsoft] Azure and AWS, it is very simple to get something wrong.”

Earlier this year, UK-based document printing production company Doxzoo had a major cloud security breach thanks to a server misconfiguration that exposed an AWS S3 bucket with over 270,000 records and 34 gigabytes of data. The data included print jobs for several high-profile clients such as the U.S. and UK military branches and Fortune 500 companies — leaving PII like passport scans and PCI data at risk for anyone to see or steal.  

Even worse, the exposure wasn’t reported to Doxzoo until four days after the misconfiguration was found via a routine scanning project. Massive amounts of business-critical data was up for grabs to anyone who had the URL to the public AWS S3 bucket.   

User error among developers and infosec professionals can lead to some of the most egregious security events. The cloud isn’t the only source to blame, however. Sometimes negligence can be an IT team’s worst enemy.    

#3: Compromising millions of records through expired security certificates 

The 2017 Equifax breach is one of the worst data leaks in history, with over 143 million records exposed containing PII like names, addresses, dates of birth, Social Security numbers, and driver license numbers. These records were stolen by hackers who exposed a vulnerability in Apache Struts, a common open source web server. The unpatched server allowed the attackers to gain access to Equifax’s systems for over two months.   

By exposing the one entry point from an expired security certificate, hackers created the perfect environment to keep coming back to the data rich Equifax servers — sending more than 9,000 queries on the databases and downloading data on 265 separate occasions.

This breach mirrors some similarities of leaks in GitHub and AWS S3, primarily in how Equifax’s response was very slow and inadequate to calm their customers’ fear and worry of having their data exposed. Equifax missed the data exfiltration events happening right under its nose for 19 months, and it took another two months for them to update the expired certificate. Only after the update happened did the company notice suspicious web traffic.   

Equifax’s former chief information officer David Webb admitted in a U.S. congressional investigation report, “Had the company taken action to address its observable security issues prior to this cyberattack, the data breach could have been prevented.”  

A strong security posture starts by securing your systems wherever you find a vulnerable point. The next step is to critical examine the entities you do business with — third and fourth party exposure can be just as devastating in a data breach.    

#4: Leaving the door open with unsecured third and fourth party vendors 

An organization that is doing everything right by controlling data exfiltration in the cloud with DLP, securing AWS S3 buckets, and maintaining current certificates on their website can still be at risk of data exposure through unsecured third and fourth party vendors.  

Damage control is hard enough when it’s just one source to deal with. But when you have to investigate and remediate a data breach that results from vendors and other business partners, there’s a lot more work to do.

Companies can accidentally leak as much as 92% of their data via URLs, cookies, or improperly configured storage. This exposure on its own is a major security problem. When you add third and fourth party vendors and services on these websites, that means the leaked information could be exposed to any of those services embedded into a compromised page.   

Third and forth party vendors provide essential services for the parent company, like expedited checkout portals with payment processors. Third party vendors often rely on fourth party services just as the parent company relies on outside help to maximize operations — on average, 40% of services on a website is powered by fourth parties.   

This is what happened in one of Target’s worst data breach events. In December 2013, a data breach leaked over 70 million Target customer records. Scammers found their way in by stealing credentials of a Target HVAC contractor. It sounds like a long and winding road to get from a third party vendor who never touches the main company’s network, but all it takes to pull off a heist like this is for one small exposure.   

With all these avenues covered — code repos, website containers, other vendors — you may think your security job is done. You must take on email security for your employees, as this is a much easier fix to a problem that can do severe damage.    

#5: Giving up on security standards with lax email policies

Email scams are the oldest trick in the cybercrime book. As some of us are still falling for phishing scams from Nigerian princes, many more well-meaning people fail at email security every day, just from inadequate email security practices.  

Poor password hygiene for email accounts (using “password” for your login credentials), not using multi-factor authentication when signing into accounts, or a lack of employee training and clear policies are contributing factors to the rapid rise in business email compromise (BEC).  

According to the FBI, losses from BEC attacks total over $26 billion. More scammers are using COVID-19 to make their way into inboxes and systems. Even with tougher regulations in place like the California Consumer Privacy Act (CCPA), which carries heavy penalties for noncompliance, BEC is still a major threat to any organization. Email users should take the extra security steps to ensure their accounts are safe.  

It’s hard to fight back against thieves, cybercriminals, and scammers — especially when your own people can do most of the damage right there inside the organization. Work with your teams to determine where security vulnerabilities exist within your networks, platforms, and systems, and train everyone on best practices for securing their own logins and access points. It could also help to back up all your hard work with a DLP solution like Nightfall that catches data you may have missed even before it can leave your network. 

This post originally appeared on Nightfall and is reproduced here with permission.

About Nightfall

Nightfall is the industry’s first cloud-native DLP platform that discovers, classifies, and protects data via machine learning. Nightfall is designed to work with popular SaaS applications like Slack and GitHub as well as IaaS platforms like AWS. You can schedule a demo with us below to see the Nightfall platform in action.

Making sense of COVID-19 tests and terminology
Making sense of COVID-19 tests and terminology 150 150 CloudGovCo

Originally published here in The Conversation under a Creative Commons Licence by Priyanka Gogna, PhD Candidate, Epidemiology, Queen’s University, Ontario

During the COVID-19 pandemic, words and phrases that have typically been limited to epidemiologists and public health professionals have entered the public sphere. Although we’ve rapidly accepted epidemiology-based news, the public hasn’t been given the chance to fully absorb what all these terms really mean.

As with all disease tests, a false positive result on a COVID-19 test can cause undue stress on individuals as they try to navigate their diagnosis, take days off work and isolate from family. One high-profile example was Ohio Governor Mike DeWine whose false positive result led him to cancel a meeting with President Donald Trump.

False negative test results are even more dangerous, as people may think it is safe and appropriate for them to engage in social activities. Of course, factors such as the type of test, whether the individual had symptoms before being tested and the timing of the test can also impact how well the test predicts whether someone is infected.

Sensitivity and specificity are two extremely important scientific concepts for understanding the results of COVID-19 tests.

In the epidemiological context, sensitivity is the proportion of true positives that are correctly identified. If 100 people have a disease, and the test identifies 90 of these people as having the disease, the sensitivity of the test is 90 per cent.

Specificity is the ability of a test to correctly identify those without the disease. If 100 people don’t have the disease, and the test correctly identifies 90 people as disease-free, the test has a specificity of 90 per cent.

This simple table helps outline how sensitivity and specificity are calculated when the prevalence — the percentage of the population that actually has the disease — is 25 per cent (totals in bold):

Table showing number of positive and negative tests in rows, and number or disease cases (total 25,000) and disease-free cases (total 75,000) in columns, along with the sensitivity of 80 per cent and the specificity of 90 per cent.
Sensitivity and specificity at 25 per cent disease prevalence. (Priyanka Gogna), Author provided

A test sensitivity of 80 per cent can seem great for a newly released test (like for the made-up case numbers I reported above).

Predictive value

But these numbers don’t convey the whole message. The usefulness of a test in a population is not determined by its sensitivity and specificity. When we use sensitivity and specificity, we are figuring out how well a test works when we already know which people do, and don’t, have the disease.

But the true value of a test in a real-world setting comes from its ability to correctly predict who is infected and who is not. This makes sense because in a real-world setting, we don’t know who truly has the disease — we rely on the test itself to tell us. We use the positive predictive value and negative predictive value of a test to summarize that test’s predictive ability.

To drive the point home, think about this: in a population in which no one has the disease, even a test that is terrible at detecting anyone with the disease will appear to work great. It will “correctly” identify most people as not having the disease. This has more to do with how many people have the disease in a population (prevalence) rather than how well the test works.

Using the same numbers as above, we can estimate the positive predictive value (PPV) and negative predictive value (NPV), but this time we focus on the row totals (in bold).

The PPV is calculated as the number of true positives divided by the total number of people identified as positive by the test.

Table showing number of positive and negative tests in rows, and columns with numbers of disease cases, disease-free cases, totals and PPV of 73 per cent and NPV of 93 per cent.
Positive and negative predictive value at 25 per cent disease prevalence. (Priyanka Gogna), Author provided

The PPV is interpreted as the probability that someone that has tested positive actually has the disease. The NPV is the probability that someone that tested negative does not have the disease. Although sensitivity and specificity do not change as the proportion of diseased individuals changes in a population, the PPV and NPV are heavily dependent on the prevalence.

Let’s see what happens when we redraw our disease table when the population prevalence sits at one per cent instead of 25 per cent (much closer to the true prevalence of COVID-19 in Canada).

Table showing numbers of positive and negative test results in rows, and disease cases, disease-free cases and totals in columns, along with values for sensitivity (80 per cent), specificity (90 per cent), PPV (seven per cent) and NPV (99.8 per cent)
Sensitivity, specificity, PPV and NPV at one per cent disease prevalence. (Priyanka Gogna), Author provided

So, when the disease has low prevalence, the PPV of the test can be very low. This means that the probability that someone that tested positive actually has COVID-19 is low. Of course, depending on the sensitivity, specificity and the prevalence in the population, the reverse can be true as well: someone that tested negative might not truly be disease-free.

False positive and false negative tests in real life

What does this mean as mass testing begins for COVID-19? At the very least it means the public should have clear information about the implications of false positives. All individuals should be aware of the possibility of a false positive or false negative test, especially as we move to a heavier reliance on testing this fall to inform our actions and decisions. As we can see using some simple tables and math above, the PPV and NPV can be limiting even in the face of a “good” test with high sensitivity and specificity.

Without adequate understanding of the science behind testing and why false positives and false negatives happen, we might drive the public to further mistrust — and even question the usefulness — of public health and testing. Knowledge is power in this pandemic.

Cloudy Backup & Restore
Cloudy Backup & Restore 150 150 CloudGovCo

This week is a slight digression. We’ve been exploring the use case for disaster recovery (DR) to the cloud, as part of the business case for hybrid cloud. We’ve found there is a dubious case for using the cloud for disaster recovery. If you can DR a workload to the cloud, you have effectively migrated it to the cloud. With sufficient bandwidth in place you will be ready for multi-cloud operations with private cloud on-premises and public cloud.

In migrating a workload to the cloud you will of course architect it for high availability (HA) with fail-over to other zones or even regions. This is DR by design. You should also design it for portability using containers and Kubernetes clusters.

Backup & restore is an integral part of DR, but having backup policies in place is not in itself DR, which must be part of a larger business continuity plan.

However, you might have a business case for storing backups in the cloud based on low storage costs. But don’t overlook costs for bandwidth and data transfer. Backups should be immutable, i.e., WORM (write once read many times) images.

Backups can be kept in direct storage for quick access or in lower-cost archival storage if you can accept a retrieval delay. Some cloud service providers can also seamlessly extend your tape library for cloud storage, using your on-premises software. This could replace your existing off-site transportation and storage using automation.

We could call this hybrid cloud.

Using the Cloud as a Warm DR Site
Using the Cloud as a Warm DR Site 150 150 CloudGovCo

To keep your business running you have a disaster recovery (DR) plan, right?. The cloud can provide DR but as we’ve discussed it doesn’t make sense to have a hot DR site in the cloud. If you have a hot site, well, in effect you have already migrated to the cloud. If you use architectural designs with resilience (auto-healing) and high availability (HA) you will have enough DR built in to satisfy most requirements. So a hot site doesn’t make sense.

The cloud can serve as a warm site. This would apply for workloads that you’re normally reluctant to put in the cloud – for whatever reason – but you need DR and you can live with using the cloud for a short period.

The attraction is that you could have a tiny footprint in the cloud, maintaining your presence, and use templates and scripts (e.g., AWS Cloud Formation) to rapidly spin up live production environments in an emergency.

To do this you will need:

  • Well architected solutions with security, privacy, HA and DR built in. Ideally these solutions will run in containers on Kubernetes clusters.
  • Licence models allowing software to run in the cloud.
  • A means of transferring large volumes of backup data to the cloud. This should use immutable snapshots.
  • A plan for client desktops to gain access, log on and authenticate.
Reciprocal Site Use Case – Disaster Recovery
Reciprocal Site Use Case – Disaster Recovery 150 150 CloudGovCo

A reciprocal or joint site could make business sense for two compatible organizations, such as within government. You would have to ensure and regularly test that there was reciprocal capacity.

In one instance we decided to test a client’s reciprocal arrangement and found that the client did not have the required security clearance to enter the reciprocal site. After this was resolved, we discovered there was insufficient capacity to load the backups.

The arrangement had been in place for years but no one had ever tested it in a disaster recovery exercise.

This use case has no cloud equivalent.

Hot Site Use Case – Disaster Recovery
Hot Site Use Case – Disaster Recovery 150 150 CloudGovCo

There is a use case for a hot site where a solution must be available 24/7 with a fail over of just several hours.

The reason for the short delay is that a hot site is vulnerable to the same ransomware and other attacks as the main site. For protection, a hot site should be offline from the network of the main site except for controlled windows for data synchronization. For additional protection, data should be synchronized to offline storage before it is brought online.

This is why fail over could take several hours. In a ransomware attack you do not want to fail over in real time to a backup site that has also been infected.

Using the Cloud

The cloud could serve as a hot site in a hybrid cloud arrangement. But why would you do this?

It is better to design a high availability solution in the cloud in the first place and move the business there completely. Protection against security threats like ransomware would be provided by network design and a robust protocol for data backup and restore.

So we will not consider this further.

Cold Site Use Case – Disaster Recovery
Cold Site Use Case – Disaster Recovery 150 150 CloudGovCo

It is difficult to imagine the use case for a cold site. It assumes you can access the main site, remove the servers, transport them to the alternate site and establish connectivity. Unless you have servers somewhere else that you can requisition it could easily take six months to purchase equipment. Few organizations can survive a business interruption longer than a week.

Use the Cloud

The cloud could serve as a cold site if you can meet several requirements:

  • A blueprint to build and test the required solutions
  • A means of transferring large volumes of backup data to the cloud
  • A plan for client desktops to access, log on and authenticate

There might be some edge cases where this is feasible, perhaps for a small business, but in general it is hard to imagine where this would be a feasible business strategy.

Even if you can rebuild from scratch in the cloud, transferring large amounts of data is a logistical problem that might require using time-consuming solutions like the AWS Snowball appliance.

An Edge Case

There is one situation where this case might apply. Suppose you do not have a disaster-recovery plan but you do have a good backup policy. Then you lose a data centre. You could rebuild as rapidly as possible in the cloud.

We will discuss this later in more detail as an accidental disaster-recovery plan.

Disaster Recovery Strategies
Disaster Recovery Strategies 150 150 CloudGovCo

Previously we identified these strategies for disaster recovery:

  • Cold site
  • Warm site
  • Hot site
  • Reciprocal site

A cold site is simply a building in another location with floors, utilities and HVAC. It does not have IT infrastructure like servers and networks installed. The recovery process includes purchasing and installing servers and networks, transporting backup tapes to the new location, and rebuilding computer services.

A warm site has hardware and connectivity already established, usually on a small scale to keep investment costs down. Older backups might already be on site as part of a backup rotation process. Transporting the latest backups to the recovery site might take some time, depending on the distance between sites.

A hot site is a complete duplicate of the original site, with near-complete backups of user data or even fully mirrored data with real-time synchronization. The goal is to be up and running within hours from a technical perspective.

A reciprocal or joint site is an agreement between two organizations to operate a joint backup site. This can be a reciprocal agreement to provision a warm site at each other’s data centers. This avoids the cost of building and maintaining a facility.