- Zenity Labs
- Posts
- Scaling AppSec With an SDL for Citizen Development
Scaling AppSec With an SDL for Citizen Development
A blog version for the talk presented at BlueHat 2024
Microsoft has the largest citizen development environment in the world—with almost 2 million apps, automations, copilots and agents, 10 million credentials and 55 thousand developers. A few weeks ago at Microsoft’s BlueHat security conference, we gave a talk sharing how our teams built that program together, operating at 1000x the scale of a typical AppSec program, remediating an incredible 95% of vulnerabilities within 4 months. Don is a Power Platform Security Architect, focused on addressing unintended consequences of citizen development within Microsoft’s internal environment. Michael is the Co-Founder and CTO of Zenity. We are extremely thankful for Microsoft for their willingness to share their story and provide a trail to follow for others who are just starting out with AppSec for low-code/no-code (LCNC). This is a post version of that talk, which is available on YouTube, here.
Thank you for reading, Don and Michael.
The talk
Table of Contents
Introduction
The scale of Microsoft’s internal low-code/no-code environments
These mind-boggling numbers represent the internal Microsoft citizen development environment today. Where would you even begin?
WHY so many devs / apps / creds?
Low-code/no-code (LCNC) tools, and increasingly GenAI, have made it intuitive and easy to create applications within the enterprise. A business user can chat with a bot, the bot creates an app.
That app outlives the chat, it’s not a full enterprise app. It has an identity, it can operate on behalf of users, it can be shared, it can have access to data. We’re no longer in a world where only professional developers can create within the enterprise environment. Today, EVERYONE is a developer.
Covid Health Check app, built with Power Platform
Michael: Two years ago, I first went to Microsoft campus to collaborate with Don. It was just after COVID, and you had to upload a proof of vaccination to enter Microsoft facilities. You did that through the COVID Health Check app.
Don: This was one of the first low-code apps to be created within Teams. Of course, this app handles sensitive information about a person’s medical situation. It doesn’t matter that this app was built with low-code rather than code, it still has to adhere to the same security, privacy and compliance standards. But we cannot ask citizen developers to become experts in how to handle sensitive data..
Low-code/no-code platforms are embedded into enterprise business applications
Michael: One thing you could be thinking, to get yourself of the hook, is that this doesn’t apply to you. Maybe you work for a bank, or a government agency, or another highly regulated environment, and you might think—we’ll never allow citizen development in our environment. Well, chances are you’ve already got it. LCNC and GenAI development tools are embedded within the business applications you already rely on – ServiceNow, Salesforce, Microsoft 365, and many more. Microsoft itself has seven different LCNC platforms used internally, all covered within the LCNC security program.
In his 2019 Ignite keynote, Satya Nadella predicted that LCNC will power a rebuilding of the entire enterprise stack within five years: “We are going to have 500 million applications that are going to get created, new, by 2023. Just to put that in perspective, that's more than all of the applications that were created in the last 40 years.”.
GenAI triggers an almost 3x growth in the number of apps within the Microsoft environment
This was before GenAI, which made app building even easier and more approachable. The Microsoft environment alone has 1.7 million apps, automations, copilots and agents today, and has grown almost 3x in the last year.
WHY is this important?
Michael: So we have two or three orders of magnitude more apps than we know of within the enterprise. But why is it important for you to invest your time in addressing their security risk? What could go wrong?
In this section we shared four stories of LCNC development gone wrong. Every story is self-contained, so we’ll release them as independent blogs. In the meantime, we encourage you to check out Michael’s BlackHat USA 2023 talk titled Sure, Let Business Users Build Their Own. What Could Go Wrong?
HOW to fail at AppSec
Or- what didn’t work.
Michael: We know that we have tons of these apps built within our organization, and we know their security issues could be severe so it’s important we address them quickly. What now?
Our first inclination was to try out AppSec best practices. Let’s explore three of them.
Best practice #1 - focus on crown jewels
Best practice #1 – focus on crown jewels. Instead of trying to cover millions of assets, we should focus only on the ones that really matter. This makes a lot of sense, but unfortunately, it isn’t helpful. Check out the number of active credentials to different crown jewel business applications across the Microsoft stack. Focusing on 1.35M SharePoint connections, for example, doesn’t really give us any meaningful scope reduction.
Best practice #2 - get developer buy-in
Best practice #2 – get developer buy-in. Can we help educate developers, increasing security awareness, to avoid having security issues in the first place? In the screenshot above, you’ll see a Dataverse table that contains sensitive data, i.e. SSNs, in plaintext. The fact that these are typically stored in the Default Power Platform environment makes it even worse, as its can be readable by the entire EntraID tenant. These kinds of mistakes are common. Try to imagine having a conversation with someone in finance, or sales, who’ve built a useful app that drives the business forward and requires processing sensitive information. Could we really expect them to be aware of proper sensitive data handling procedures?
Best practice #3 - apply the Secure Developer Lifecycle (SDL)
Best practice #3 – apply the Secure Developer Lifecycle (SDL). The SDL is a set of practices and tools that help developers build more secure software, introduced by Microsoft over 20 years ago. Could we apply the SDL to citizen development?
How well does SDL Guidance fit with LCNC?
Well, not really.
Don: My analysis of our internal SDL requirements shows that only about 30% of the technical requirements directly apply to LCNC, and many of those are squishy. As an example ,consider the requirement to use encrypted communication. Of course, HTTPS is enforced at the platform level, but what about your applications connections? Is the Citizen Developer even using HTTPS for connections that use URLs? Does the back-end service even have HTTPS turned-on? Is it patched and properly configured - if not then your sensitive data could be plaintext in-transit. These are under-the-hood nuances that just don’t fall within a LCNC developer’s awareness.
Microsoft relies heavily on tooling to enforce SDL compliance. But these tools require access to code or binaries, none of which are available for scanning with LCNC.
SDL requirements are written for a technical audience. From the perspective of a citizen developer, SDL requirements are often technobabble at worst; failing to provide LCNC specific guidance at best.
Different organizations and roles play a key part in the SDL- business, engineering, Quality Assurance and Operations all own a part of the puzzle. But with LCNC, business users move rapidly from envision to create to publish, closing a fast loop and iterating straight into production.
Applying AppSec best practice as-is doesn’t work
Michael: So AppSec best practice didn’t really get us anywhere.
But wait. If building these apps is easy, shouldn’t fixing vulns be easy as well?
This leads us to the idea of auto-fix. For some kind of security vulnerabilities, we know enough about the app, its environment and surrounding components, that we can recommend just the right change of configuration or “code” to fix the issue without negative implications on the business.
Some vulnerabilities can be automatically fixed without negative business impact
So now we’ve got a plan.
We’ll auto-fix whatever we can, gaining early success. Success gets us management buy-in. Management buy-in gets us the resources and backup we need to scale a program that can fix all vulnerabilities, whether they can be auto-fixed or not.
Our plan
HOW we made it work
Our goals
Don: We set clear goals for the program. We want to remediate all existing vulnerabilities, with 2-3 dedicated headcount, within 6 months (we finished in 4). We want a Minimum Viable Product (MVP) that relies on self-service and auto-fix, so we can scale this up without needing to scale our team.
Applying automated remediation: an MVP
Don: Auto-fix requires two things. First, we need to have enough confidence in our fix to know that it’s not going to cause negative business impact. Second, we need to technically be able to put the asset in a secure state. We remediate vulns overnight (if you’re in the Americas). This allowed us to make very quick progress, remediating as far as ~20K vulnerabilities overnight. Our program was able to auto-fix 25% of existing vulnerabilities.
What happens when automated remediated cannot be applied
Don: For things we can’t auto-fix, we give the citizen developer clear instructions and a 30 window to fix it. If they fail to comply, we delete the asset.
Stay green
Don: So far, we’ve discussed only existing vulns, i.e. brownfield. But of course, we want to ensure the tenant stays in a secure state. We continuously identify net new risk introduced into the tenant, and apply the same process to it, i.e. greenfield. This ensures that any net new risk is remediated either immediately, or after a 30 days to fix window.
Remediation workflow
Don: Here’s a deep dive into the remediation workflow when auto-fix is not possible. The 30-day clock starts ticking when the LCNC developer receives one of our mails. We might be doing a burn down campaign of existing violations, or a new violation comes in from Zenity. We send out initial mail to LCNC developer, and if necessary sending again after 14 days and if needed after 23 days. At any time in the process, if the issue is fixed, we drop out of the process. There are legitimate use cases where the violation is a false positive. Yes, you may have legitimate business reasons for allowing Guests to run your app/workflow/copilot etc. If so the LCNC Dev provides a reason and the violation is closed. I fully acknowledge that self-attestation has certain issues with it because its prone to being abused. But we had to start somewhere. The LCNC Dev has been given a step-by-step instructions but we do have a safety valve for people who have questions. Depending on your environment strategy, you may want to move the assets into a separate environment, typically developer environments. But that process was being handled by our close partners, the Power Platform Governance team
SharePoint site with step-by-step remediation instructions
First email notification sent to the LCNC developer
Final email notification sent to the LCNC developer
Don: These are examples of the step-by-step instructions that a LCNC developer receives, alongside the first and last email notice. We worked with an editor to ensure the content was concise, clear and to the point.
Self-service violation dashboard for LCNC developers
Don: These emails route developers to a self-service dashboard. I have blacked out some of the connection names in the name of preventing information disclosure.
The first connection may or may not be a security violation. Not everything is in the cloud and there will be times you have to use an on-prem connector.
The bottom two violations can be loosely coupled.
Connection is using a shareable authentication method is really saying “you’re not using EntraID authentication, also known as Windows Integrated authentication.
EntraID has built in security controls that make it very robust compared to say a mere userid and password.
Without EntraID you can get side-effects specific to the data source such as being accessible to the entire tenant or the connection is shareable without your knowledge or consent.
Playbooks power both brownfield and greenfield automation
Don: Playbooks is what made our brownfield automation work equally well for greenfield.
A burn down campaign might focus on misconfiguration that can’t be silently remediated.
Once were in Stay Green, when a new violation is discovered, within hours its either auto-fixed or we’ve sent mail and the 30-day clock is ticking..
Results, showing we were able to achieve our goals within four months
Don: I’ll be honest, some of our colleagues didn’t think we had a very good chance of success. The issues standing in our way felt like “can’t get there from here.” These same folks were very gracious in celebrating our victory when we did succeed But as always, the eternal question: What are you doing for me now?
So were expanding the program to other LCNC platforms and services. Power BI, Salesforce, ServiceNow and others are on our roadmap with varying degrees of completion. Which is why I've been empathizing LCNC all-up rather than be specific to Power Platform.
Vulnerability remediation over time, showing 95% remediation within four months
Don: Here’s a visual representation of the success of our Get-to-Green. By the time we got to June, the only open violations were ones that had come in within the last 30 days.
95% of vulns were fixed within 4 months.
Takeaways
Leverage industry-standard security risk categorization
OWASP LCNC Top 10
OWASP LLM Top 10
Don: The SDL was in the spine for our objectives but we looked to other sources to think about what violations we wanted to resolve first in our Get-to-Green campaigns. The OWASP LCNC Top 10 has been gaining increasing traction and from my understanding especially with CISOs. In fact, Michael and I both invite you all to participate. We are due for a refresh of this list in 2025.
With LCNC increasing embrace of AI, the OWASP LLM Top 10 is increasingly applicable. Not every risk category shown here applies directly to the LCNC space. At least not yet. As fast as things are moving that could be a very different answer all too soon.
We needed to prioritize what we need to fix first
We merged similar OWASP Top 10 categories together & reviewed SDL gap analysis, and also pivoted on Senior Leadership Team priorities.
Campaigns included:
• Guest and/or Access Control
• AI and/or Copilot issues
• Oversharing of data
• Sensitive Data Leakage
• Hardcoded Secrets
• Misconfiguration & Miscellany
It may seem like Oversharing of data and Sensitive Data Leakage are tightly coupled. There was enough of a distinction int he scanning rules that it was worth breaking them out.
Remediation statistics by campaign and severity
Don: We envisioned moving from one campaign to the next, to the next and in a stately manner. Reality was messier than that. Any given mailing was a mix of 2 to 3 campaigns based on Senior Leadership priorities. Our first silent remediation campaign was a mix of all 6 categories. But the categorization was useful for us in the actual campaigns as well as reporting.
Michael: If we take a step back and think about it, its clear that giving developer-level power to everyone within the enterprise, means we have a responsibility to help them make security-conscious choices. This is not a new concept: it is called the Shared Responsibility Model.
The Shared Responsibility Model for LCNC
We’re used to this model in the cloud—the cloud provider is responsible for the security of the platform and for providing secure building blocks, but you, the builder are responsible for what you build. Only the organization that uses LCNC understands the risk tolerance, business criticality and operational realities of your business. You are in charge of what you build.
The customer’s side of the Shared Responsibility Model for LCNC
In many cases we, the users of LCNC, do not fully take on that responsibility. We’re letting business users business their own, without security involvement. What could go wrong?
Applying the Shared Responsibility Model to Power Platform
Don: I expanded on the Shared Responsibility Model to provide a clear guidelines for the different roles and domains that are part of a successful Power Platform program. The LCNC developer, admin, security team and LCNC platform itself all share a role in securing it.
We do have a forthcoming whitepaper that goes into some detail about this from a high-level. And soon after that a paper with tactical, actionable guidance specific to Power Platform for this table.
This has proven very clarifying for us in that there are assumptions made with LCNC development, you don’t’ have to worry about certain things, or take actions that would be associated with full-code development. How this impacts our training, tooling and processes will vary from organization to organization.
We achieved SDL in-spirit
As with the “real” SDL, the core of our solution is tooling. We haven’t focused on the scanning tool we used so much in this talk so much as what it took for us to comply with the SDL in spirit.
What did we learn?
In conclusion
Our conclusions
LCNC is powerful and is being used in most enterprises today, even if they didn’t make a top-to-bottom choice to do so
Building a successful AppSec program for LCNC, operating at 100-1000x is possible
You can have both productivity and security if you apply the SDL - at least “in-spirit”.
More resources
Reply