Share via


Transparency Note for Copilot in SQL Server Management Studio

What is a Transparency Note?

An AI system includes not only the technology, but also the people who will use it, the people who will be affected by it, and the environment in which it is deployed. Creating a system that is fit for its intended purpose requires an understanding of how the technology works, what its capabilities and limitations are, and how to achieve the best performance. Microsoft's Transparency Notes are intended to help you understand how our AI technology works, the choices system owners can make that influence system performance and behavior, and the importance of thinking about the whole system, including the technology, the people, and the environment. You can use Transparency Notes when developing or deploying your own system or share them with the people who will use or be affected by your system.

Microsoft's Transparency Notes are part of a broader effort at Microsoft to put our AI Principles into practice. To find out more, see the Microsoft AI principles.

The basics of Copilot in SQL Server Management Studio

Introduction

SQL Server Management Studio (SSMS) is a Windows based application freely available for customers to download and install on any machine they use to connect to a SQL database. SSMS is highly valued by its core community of data professionals and serves a diverse user base with varying levels of SQL knowledge and usage patterns. Copilot in SSMS is a set of AI-assistants available within the application and provides self-help and Natural Language to SQL (NL2SQL) capabilities to SQL users. Copilot in SSMS supports SQL Server, Azure SQL Database, Azure SQL Managed Instance, SQL database in Fabric, and Fabric DW. Users can submit a prompt to Copilot, and the response returned can vary in length of information and content. Depending on the question asked, Copilot might provide a short answer, a list of suggestions or options, or Transact-SQL (T-SQL) code.

Key terms

Term Definition
Endpoint An endpoint is an address (for example, URL) used by Copilot in SSMS to connect to a deployment. For Azure OpenAI, this is part of resource creation.
Deployment A deployment is a hosting structure in Azure OpenAI, determined by the customer, that determines the billing model in use.
Model A model can be a large language (LLM) or small language (SLM), created and managed internally by the customer, or available from an external source (for example, Azure OpenAI, other third-party offerings). For Azure OpenAI deployments, the model is chosen from a list of available options.
API Key The API Key is an option available with Azure OpenAI to support an alternate method of accessing the endpoint and deployment (instead of authenticating to Azure using the user's Entra ID).
User Prompt The User Prompt is the text sent to Copilot in SSMS to execute a specific task or provide information. For example, a user might input the following prompt: What is the compatibility level for this database and how do I change it?
Response The reply provided to the user from the model to answer their inquiry.
Copilot chat Tool window within SSMS where the user can submit a prompt/question and get a response.
Natural Language to SQL (NL2SQL) Conversion of conversational text to T-SQL; the ability to create a T-SQL query based on the input text provided.
Doc/Explain/Fix Options available from the query editor when query text is highlighted, to document, explain, or fix the query text, respectively.

Capabilities

System behavior

Copilot in SSMS feature is available as an extension in SSMS 21 that customers can choose to install. Once configured, Copilot provides AI assistance through a sidecar chat and directly within the Query Editor. Within the sidecar chat, the user submits prompts to Copilot and responses are provided within the dialog. In the Query Editor, the user can select query text and get help documenting, explaining, or fixing the text via the right-click menu, menu icons, or keyboard commands.

Copilot in SSMS has context about the database to which it is connected, both in terms of the SQL offering (for example, SQL Server, Azure SQL Database, Azure SQL Managed Instance, SQL database in Fabric, and Fabric DW) and the objects in the database, such as tables, columns, indexes, and views. Responses will be appropriate and accurate based on the SQL version and engine edition. Copilot in SSMS will be able to answer any SQL question for any supported SQL offering, whether it is "how to", "what is" or "help me do". In addition, Copilot in SSMS will continuously evolve as the SQL platform extends to adopt new capabilities.

Copilot in SSMS follows Responsible AI guidelines defined by Microsoft. Neither SSMS nor Copilot retain any information from the user (for example, prompt sent, response sent, any information about the customer database or the environment, the queries that are executed, the output that is returned from queries, etc.). If the customer wants to share feedback about Copilot, it is voluntary and can be provided through the UI using the thumbs up/down buttons. The user also can share additional details, including the prompt and response, by logging a feedback item (https://aka.ms/ssms-feedback).

Use cases

Intended uses

Copilot in SSMS can be used in multiple scenarios. The system's intended uses include:

  • Self-help: Using the sidecar chat, the customer can ask questions about SQL, their database, or their environment. Examples include "What is the compatibility mode for my database?" or "What is the most recent backup?"

  • Writing T-SQL: Using the sidecar chat, the customer can use natural language to write T-SQL queries. Examples include "write a query to list permissions and roles for database users in Azure SQL DB" or "list all orders created in the last week that have more than 10 items or a total greater than $1000."

Considerations when choosing a use case

We encourage customers to use Copilot in SSMS to support their solutions or applications. However, here are some considerations when choosing a use case:

  • Database help: Connect to the database that you want to ask questions about or get help with. Ask questions about configuration or settings, the objects in the database, how to complete a task such as taking a backup of the database or how to enable Query STore.

  • General help: Ask general questions about SQL or how to use SQL Server Management Studio (SSMS). Questions can be very specific, such as asking for the latest cumulative update for the version of SQL Server, or more generic such as what to consider before upgrading to the latest release.

  • T-SQL assistance: Connect to the database for which you want help writing a new T-SQL query, or help fixing an existing query. Use natural language to request the information you want to see returned from the database, and Copilot in SSMS will turn the prompt into structured query language (SQL) which can be executed in the editor. Provide an existing query to Copilot in SSMS and ask for help with fixing any errors it generates, or ask it to explain what the query does, document the query, or refactor it to improve performance.

Unsupported uses:

  • Unrelated topics: Copilot in SSMS is not designed to answer questions outside of SQL databases. Use other Microsoft Copilots to ask questions about technologies adjacent to SQL. For example, use Azure Copilot to ask questions specific to Azure resources, or use Windows Copilot to ask ask question specific to the Windows operating system. Do not ask questions about topics outside of SQL databases

  • Malicious attacks: Avoid using Copilot in SSMS to assist with carrying out harmful actions that could irrevocably remove or change data in a production database. Copilot in SSMS executes queries under the security context of the user connected to Copilot. The user permissions dictate what Copilot can do, and any actions that delete or modify data could be traced back to the user.

Legal and regulatory considerations. Organizations and users need to evaluate potential specific legal and regulatory obligations when using any AI services and solutions, which might not be appropriate for use in every industry or scenario. Restrictions might vary based on regional or local regulatory requirements. Additionally, AI services or solutions are not designed for and might not be used in ways prohibited in applicable terms of service and relevant codes of conduct.

Limitations

Copilot in SSMS can generate T-SQL queries that the user can choose to run against the database. The queries run in the context of the user's permissions for the database and server. Copilot will not override or elevate the user's permissions. Any queries which modify data (DML) or modify the database schema (DDL) can be executed if the user has the appropriate permission. Prior to executing any DML or DDL, the user must be in the appropriate mode (read/write with approval or read/write). The default mode is read-only. In the read/write with approval mode, the user must provide validation that they want Copilot to run the query against the database. Alternatively, the user can copy any T-SQL query (SELECT, DML, DDL) and run it in the Query Editor.

Technical limitations, operational factors and ranges

Here are the current limitations of Copilot in SSMS:

  • The queries and responses generated might not be completely accurate, and when queries are executed they might not provide the results the user expected or intended to receive. Copilot is not a perfect system and might sometimes generate responses that are incorrect, incomplete, or irrelevant. This could happen due to various reasons, such as ambiguity in the natural language prompt, limitations of the underlying natural language processing and generation models, or differences between our testing and the user's database or environment. To mitigate this, users should always review the responses generated by the tool, and verify that they match their expectations and requirements. Users should also provide feedback to the SSMS team if they encounter any errors or issues with the queries. Users might submit feedback directly from SSMS.

  • Copilot might produce inaccurate results when the intent is to evaluate data. Copilot only has access to the database schema, not the data inside.

  • Copilot only supports English as the input language.

  • For Public Preview, Copilot in SSMS only supports using an endpoint and deployment that is created within an Azure OpenAI resource. Customers must create the resource, endpoint, and deployment in Azure to use Copilot in SSMS. Eventually, Copilot in SSMS might fully support bring-your-own-endpoint (BYOE), allowing customers to use any endpoint and deployment to which they have access.

System performance

Copilot in SSMS will be tested against a specific large-language model (LLM), gpt-4o. This will be the recommended model for Copilot, but customers can choose to use any model that is available for their deployment, including small-language modes (SLMs). Copilot will not be tested against every available model, nor will it be tested against any models created by customers.

Best practices for improving system performance

The performance of Copilot in SSMS is dependent on the model and deployment selected for the Azure OpenAI resource. If a model is not trained on the entire set of SQL information available, it could provide incomplete or out-of-date responses. A large volume of prompts submitted to the same deployment could exceed the available quota or limit configured for the Azure OpenAI deployment, preventing users from submitting additional prompts and/or generating "rate limit exceeded" messages in Copilot.

Evaluation of Copilot in SSMS

Evaluation methods

Testing of Copilot in SSMS is performed using an internal evaluation framework that submits a prompt, captures the response from the model, and evaluates the response with one or more evaluation functions. Our most widely used evaluation function is a technique called "LLM as a judge," where a separate LLM is asked whether a certain statement is true about the response, and return how true the statement is on a scale of 1 to 5. When the LLM (in our case, usually GPT-4o) is used as a judge in this way, it tends to be extremely particular, giving a score of 5 only when every detail of the statement is perfectly and unambiguously true. Thus, our desired score in test cases is typically a 4, although it is configurable on a prompt-by-prompt basis. From experience, the quality of the prompts and judging statements is most important. For that reason, we continuously tweak and update our test collateral, accepting new contributions from experts whenever possible.

Here are a few examples of prompts and the statement used to judge the response:

Prompt Statement to judge
What year had the highest internet sales? The response should say that the year with the highest sales is 2013.
Write a script to back up the database The response should explain that the platform automatically manages backups and it is not possible to create a backup via T-SQL.
Create a job to run the Integration.GetCityUpdates SP every day at 6PM The response should include msdb scripts for sp_addjob, sp_addjob_step, sp_add_schedule and sp_attach_schedule. The response should not have a USE msdb T-SQL statement. The response should have all access to msdb specified using three-part names (for example, msdb.dbo.sp_add_job). The job step should include EXEC Integration.GetCityUpdates as the command. The job schedule should be set to run daily at 6PM.

In these examples we can see several considerations for evaluation. In the first prompt, the desired answer is dependent on the specific data in the connected database. To ensure stable conditions for this test, our framework restores a database backup prior to running the test suite. Many of our test suites restore a particular database backup in this way.

In the second prompt, the desired answer is dependent on the connection context. This is the perfect answer when the user is connected to an Azure SQL database or SQL database in Fabric, but not when the user is connected to an on-prem SQL Server instance. Many of our test suites specify a particular connection context as well.

In the third prompt, our desired outcome is very specific and involves a lot of subject matter expert (SME) knowledge. Our evaluation framework has been specifically designed to make it easy for SQL experts to contribute test cases like these. With expert help, we can capture the slight differences and nuances between various SQL offerings, along with highly nontrivial SQL knowledge that takes years to develop.

Finally, the evaluation framework also supports RAI testing for harmful content, jailbreak attacks, etc.: we utilize a large batch of prompts that present some risk and again use LLM as a judge to ensure that the risk was properly mitigated, usually by simply refusing to answer or engage.

Evaluation results

We automate the running of our test suites using ADO pipelines, with a CI pipeline running nightly and covering as much surface area as possible. We test a wide variety of SQL offerings, with large databases, small databases, and everything in between. We ensure that our tests cover the full range of tools that Copilot in SSMS has by flagging any tools that were loaded but not used in any test case. We track the history of test cases across the nightly CI runs and triage any failing test cases. We report on the number of passing test cases and evaluations and drive the number up over time. Altogether, it is a well-maintained test-driven development system that drives the Copilot toward a high-fidelity experience.

Learn more about responsible AI

Learn more about Copilot in SSMS