The 5 Legal and Data Security Risks of AI Use in Software Development

Jonathan Sharabi

Jul 25, 2023

The 5 Legal and Data Security Risks of AI Use in Software Development

Generative AI (Artificial Intelligence) has seen adoption across nearly every industry over the last year. Recent research has found that software development is a top business function for adoption, followed by marketing and customer service. This is only going to grow as AI technologies continue to advance at a rapid pace. On April 6, 2023 news broke that Samsung discovered employees putting confidential data into ChatGPT. As an emergency measure the company limited input to ChatGPT to 1024 bytes.

The most popular generative AI solutions that developers are adopting are OpenAI’s ChatGPT and GitHub Copilot. These solutions leverage large language models (LLMs) trained on massive amounts of text and code. Then, when prompted, these solutions suggest code, test cases, and explanations based on prompts. While ChatGPT is widely used beyond development, Copilot is an AI-powered pair programming editor that’s built specifically for coding.

Read on to learn how AI is being used by developers, its potential legal and data security impact, and how to mitigate risks associated with its use.

How is Generative AI used in Software Development?

Here are a few ways developers are using AI tools to improve their productivity.

AI Code Generation and Completion
AI generates code based on natural language prompts and automatically suggest additional code based on contextual information. This allows developers to write code from higher-level abstract thoughts as well as avoid writing low-level boilerplate code.

Code Translation
AI-code generator helps developers automatically port software to other platforms. Many of these AI solutions support multiple languages, so developers can prompt ChatGPT or Copilot to translate legacy applications to newer tech stacks. If developers have questions about a particular code block, AI can also analyze it and provide basic explanations without the need to conduct in-depth research.

Code Optimization and Refactoring
Generative AI reviews existing code and automatically refactors it to reduce complexity and improve performance. The improvements include three things. First, it can reduce the application size by removing unused code. Second, it can use parallel processing or compiler optimizations. Third, it can make the code easier to read. AI could also identify optimization opportunities that human developers might have overlooked.

Software Testing and Debugging
AI helps create unit and functional tests using natural language to bring greater efficiency to software testing processes. A code-based AI solution analyzes source code to automatically detect bugs early on in the development process. ChatGPT might even be able to explain what’s wrong with code to make it easier for developers to debug the issue.

What are 5 Legal and Data Security Risks of a Generative AI?

Although there are many benefits to integrating AI into software development workflows, there are also some data security implications for software companies. Here are five potential legal and security risks:

1. Open Source License Violation

Many AI models are trained on public code repositories like GitHub, which means the code they generate might violate open source licenses. Copilot, for example, does not include any attribution to the original code author, a key requirement of most open source licenses.

2. Copyright Law Violation

Besides the legal implications of how the models are trained, these AI solutions could also reproduce existing code verbatim. Developers could inadvertently use and distribute this copied code, which violates copyright laws and puts the business at risk.

3. Security Vulnerabilities Exposure

Generated code can pose security risks that may be hard for developers to spot if they didn't write the code themselves. For example, AI-generated code could contain security vulnerabilities that malicious actors can exploit. The overall security of the application can be compromised, not just the code that is suggested by AI.

4. Proprietary Information Disclosure

AI-generated code may inadvertently leak proprietary code or confidential algorithms owned by the organization. By disclosing these trade secrets, organizations could lose their competitive advantage and waste resources spent on research and development.

5. Sensitive Data Leakage

AI-generated code could unintentionally expose sensitive data that should remain confidential. For example, the code could contain hardcoded credentials, database connection information, or even financial and personal identifiable information (PII) about customers.

How Developers Can Use Generative AI Safely with Cyera’s SafeType

As you can see, AI is transforming the way code is written and adding a new level of automation to the software development industry. Developers who want to adopt ChatGPT, Copilot, or another solution should stay up-to-date with the latest trends around AI safety.

ChatGPT and other AI solutions allow personal information to be collected from chat sessions and shared with other organizations. That means the prompts and code within these chats introduce privacy risks that development teams need to consider.

SafeType is an open source extension for Chrome and Edge browsers developed by Cyera Labs. The extension alerts users when they’re about to input sensitive data during a ChatGPT session and enables them to automatically anonymize the information. This is one of many ways to mitigate privacy risks associated with using ChatGPT for software development.

Follow Cyera as we continue to explore the data security risks of generative AI and discover ways to use AI safely. Please join our public Slack community #cyeralabs and share your thoughts with us. And if you don't have SafeType yet, download it here!