Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Azure Managed Instance for Apache Cassandra is a fully managed service for pure open-source Apache Cassandra clusters. The service also allows configurations to be overridden, depending on the specific needs of each workload, which allows maximum flexibility and control where needed.
This quickstart demonstrates how to use the Azure CLI commands to configure a hybrid cluster. If you have existing datacenters in an on-premises or self-hosted environment, you can use Azure Managed Instance for Apache Cassandra to add other datacenters to those clusters and maintain them.
Prerequisites
Use the Bash environment in Azure Cloud Shell. For more information, see Get started with Azure Cloud Shell.
If you prefer to run CLI reference commands locally, install the Azure CLI. If you're running on Windows or macOS, consider running Azure CLI in a Docker container. For more information, see How to run the Azure CLI in a Docker container.
If you're using a local installation, sign in to the Azure CLI by using the az login command. To finish the authentication process, follow the steps displayed in your terminal. For other sign-in options, see Authenticate to Azure using Azure CLI.
When you're prompted, install the Azure CLI extension on first use. For more information about extensions, see Use and manage extensions with the Azure CLI.
Run az version to find the version and dependent libraries that are installed. To upgrade to the latest version, run az upgrade.
Azure CLI version 2.30.0 or higher. If you're using Azure Cloud Shell, the latest version is already installed.
Azure Virtual Network with connectivity to your self-hosted or on-premises environment. For more information on connecting on premises environments to Azure, see Connect an on-premises network to Azure.
Configure a hybrid cluster
Sign in to the Azure portal and navigate to your virtual network resource.
Open the Subnets tab and create a new subnet. To learn more about the fields in the Added subnet form, see Add a subnet.
Note
The deployment of Azure Managed Instance for Apache Cassandra requires internet access. Deployment fails in environments where internet access is restricted. Make sure you aren't blocking access within your virtual network to the following vital Azure services that are necessary for Managed Cassandra to work properly. For a list of IP address and port dependencies, see Required outbound network rules.
- Azure Storage
- Azure KeyVault
- Azure Virtual Machine Scale Sets
- Azure Monitoring
- Microsoft Entra ID
- Azure Security
Apply some special permissions to the virtual network and subnet which Cassandra Managed Instance requires, using Azure CLI. Use the
az role assignment create
command, replacing<subscriptionID>
,<resourceGroupName>
, and<vnetName>
with the appropriate values:az role assignment create \ --assignee a232010e-820c-4083-83bb-3ace5fc29d0b \ --role 4d97b98b-1d4f-4787-a291-c67834d212e7 \ --scope /subscriptions/<subscriptionID>/resourceGroups/<resourceGroupName>/providers/Microsoft.Network/virtualNetworks/<vnetName>
Note
The
assignee
androle
values in the previous command are fixed service principal and role identifiers respectively.Configure resources for our hybrid cluster. Since you already have a cluster, the cluster name is a logical resource to identify the name of your existing cluster. Make sure to use the name of your existing cluster when defining
clusterName
andclusterNameOverride
variables in the following script.You also need, at minimum, the seed nodes from your existing datacenter, and the gossip certificates required for node-to-node encryption. Azure Managed Instance for Apache Cassandra requires node-to-node encryption for communication between datacenters. If you don't have node-to-node encryption implemented in your existing cluster, implement it. For more information, see Node-to-node encryption. Supply the path to the ___location of the certificates. Each certificate should be in PEM format, for example,
-----BEGIN CERTIFICATE-----\n...PEM format 1...\n-----END CERTIFICATE-----
. In general, there are two ways of implementing certificates:Self signed certs. A private and public (no CA) certificate for each node. In this case, you need all public certificates.
Certs signed by a CA. This certificate can be a self-signed CA or even a public one. In this case, we need the root CA certificate and all intermediaries, if applicable. For more information, see Preparing SSL certificates for production.
Optionally, if you want to implement client-to-node certificate authentication or mutual Transport Layer Security (mTLS) as well, provide the certificates in the same format as when creating the hybrid cluster. See Azure CLI sample later in this article. The certificates are provided in the
--client-certificates
parameter.This approach uploads and applies your client certificates to the truststore for your Cassandra managed instance cluster. That is, you don't need to edit cassandra.yaml settings. Once applied, your cluster requires Cassandra to verify the certificates when a client connects. See
require_client_auth: true
in Cassandra client_encryption_options.Note
The value of the
delegatedManagementSubnetId
variable you supply in this code is the same as the value of--scope
that you supplied in an earlier command:resourceGroupName='MyResourceGroup' clusterName='cassandra-hybrid-cluster-legal-name' clusterNameOverride='cassandra-hybrid-cluster-illegal-name' ___location='eastus2' delegatedManagementSubnetId='/subscriptions/<subscriptionID>/resourceGroups/<resourceGroupName>/providers/Microsoft.Network/virtualNetworks/<vnetName>/subnets/<subnetName>' # You can override the cluster name if the original name isn't legal for an Azure resource: # overrideClusterName='ClusterNameIllegalForAzureResource' # the default cassandra version will be v3.11 az managed-cassandra cluster create \ --cluster-name $clusterName \ --resource-group $resourceGroupName \ --___location $___location \ --delegated-management-subnet-id $delegatedManagementSubnetId \ --external-seed-nodes 10.52.221.2 10.52.221.3 10.52.221.4 \ --external-gossip-certificates /usr/csuser/clouddrive/rootCa.pem /usr/csuser/clouddrive/gossipKeyStore.crt_signed # optional - add your existing datacenter's client-to-node certificates (if implemented): # --client-certificates /usr/csuser/clouddrive/rootCa.pem /usr/csuser/clouddrive/nodeKeyStore.crt_signed
Note
If your cluster already has node-to-node and client-to-node encryption, you should know where your existing client and/or gossip SSL certificates are kept. If you're uncertain, run
keytool -list -keystore <keystore-path> -rfc -storepass <password>
to print the certs.After the cluster resource is created, run the following command to get the cluster setup details:
resourceGroupName='MyResourceGroup' clusterName='cassandra-hybrid-cluster' az managed-cassandra cluster show \ --cluster-name $clusterName \ --resource-group $resourceGroupName \
The previous command returns information about the managed instance environment. You need the gossip certificates so that you can install them on the trust store for nodes in your existing datacenter. The following screenshot shows the output of the previous command and the format of certificates:
Note
The certificates returned from the preceding command contain line breaks represented as text, for example
\r\n
. You should copy each certificate to a file and format it before attempting to import it into your existing trust store.Tip
Copy the
gossipCertificates
array value shown in the screenshot into a file, and use the following bash script to format the certs and create separate pem files for each. To download the Bash script, see Download jq for your platform.readarray -t cert_array < <(jq -c '.[]' gossipCertificates.txt) # iterate through the certs array, format each cert, write to a numbered file. num=0 filename="" for item in "${cert_array[@]}"; do let num=num+1 filename="cert$num.pem" cert=$(jq '.pem' <<< $item) echo -e $cert >> $filename sed -e 's/^"//' -e 's/"$//' -i $filename done
Next, create a new datacenter in the hybrid cluster. Replace the variable values with your cluster details:
resourceGroupName='MyResourceGroup' clusterName='cassandra-hybrid-cluster' dataCenterName='dc1' dataCenterLocation='eastus2' virtualMachineSKU='Standard_D8s_v4' noOfDisksPerNode=4 az managed-cassandra datacenter create \ --resource-group $resourceGroupName \ --cluster-name $clusterName \ --data-center-name $dataCenterName \ --data-center-___location $dataCenterLocation \ --delegated-subnet-id $delegatedManagementSubnetId \ --node-count 9 --sku $virtualMachineSKU \ --disk-capacity $noOfDisksPerNode \ --availability-zone false
Note
The value for
--sku
can be chosen from the following available SKUs:- Standard_E8s_v4
- Standard_E16s_v4
- Standard_E20s_v4
- Standard_E32s_v4
- Standard_DS13_v2
- Standard_DS14_v2
- Standard_D8s_v4
- Standard_D16s_v4
- Standard_D32s_v4
The value for
--availability-zone
is set tofalse
. To enable availability zones, set this value totrue
. Availability zones increase the availability SLA of the service. For more information, see SLA for Online Services.Warning
Availability zones aren't supported in all regions. Deployments fail if you select a region where Availability zones aren't supported. For supported regions, see Azure regions list.
The successful deployment of availability zones is also subject to the availability of compute resources in all of the zones in the given region. Deployments might fail if the SKU you selected, or capacity, isn't available across all zones.
Now that the new datacenter is created, run the show datacenter command to view its details:
resourceGroupName='MyResourceGroup' clusterName='cassandra-hybrid-cluster' dataCenterName='dc1' az managed-cassandra datacenter show \ --resource-group $resourceGroupName \ --cluster-name $clusterName \ --data-center-name $dataCenterName
The previous command displays the new datacenter's seed nodes:
Add the new datacenter's seed nodes to your existing datacenter's seed node configuration in the cassandra.yaml file. Install the managed instance gossip certificates that you collected earlier to the trust store for each node in your existing cluster, using
keytool
command for each cert:keytool -importcert -keystore generic-server-truststore.jks -alias CassandraMI -file cert1.pem -noprompt -keypass myPass -storepass truststorePass
Note
If you want to add more datacenters, you can repeat the preceding steps, but you only need the seed nodes.
Important
If your existing Apache Cassandra cluster only has a single data center, and this data center is the first being added, ensure that the
endpoint_snitch
parameter incassandra.yaml
is set toGossipingPropertyFileSnitch
.Important
If your existing application code is using QUORUM for consistency, ensure that before changing the replication settings in the next step, your existing application code is using LOCAL_QUORUM to connect to your existing cluster. Otherwise, live updates fail after you change replication settings in the following step. After you change the replication strategy, you can revert to QUORUM if preferred.
Finally, use the following CQL query to update the replication strategy in each keyspace to include all datacenters across the cluster:
ALTER KEYSPACE "ks" WITH REPLICATION = {'class': 'NetworkTopologyStrategy', 'on-premise-dc': 3, 'managed-instance-dc': 3};
You also need to update several system tables:
ALTER KEYSPACE "system_auth" WITH REPLICATION = {'class': 'NetworkTopologyStrategy', 'on-premise-dc': 3, 'managed-instance-dc': 3} ALTER KEYSPACE "system_distributed" WITH REPLICATION = {'class': 'NetworkTopologyStrategy', 'on-premise-dc': 3, 'managed-instance-dc': 3} ALTER KEYSPACE "system_traces" WITH REPLICATION = {'class': 'NetworkTopologyStrategy', 'on-premise-dc': 3, 'managed-instance-dc': 3}
Important
If the data centers in your existing cluster don't enforce client-to-node encryption (SSL), and you intend for your application code to connect directly to Cassandra managed instance, you also need to enable SSL in your application code.
Use hybrid cluster for real-time migration
The preceding instructions provide guidance for configuring a hybrid cluster. This approach is also a great way of achieving a seamless zero-downtime migration. The following procedure is for migrating an on-premises or other Cassandra environment that you want to decommission with zero downtime, to Azure Managed Instance for Apache Cassandra.
Configure hybrid cluster. Follow the previous instructions.
Temporarily disable automatic repairs in Azure Managed Instance for Apache Cassandra during the migration:
az managed-cassandra cluster update \ --resource-group $resourceGroupName \ --cluster-name $clusterName --repair-enabled false
In Azure CLI, use the following command to run
nodetool rebuild
on each node in your new Azure Managed Instance for Apache Cassandra data center, replacing<ip address>
with the IP address of the node, and<sourcedc>
with the name of your existing data center, the one you're migrating from:az managed-cassandra cluster invoke-command \ --resource-group $resourceGroupName \ --cluster-name $clusterName \ --host <ip address> \ --command-name nodetool --arguments rebuild="" "<sourcedc>"=""
You should run this command only after all of the prior steps have been taken. This approach should ensure that all historical data is replicated to your new data centers in Azure Managed Instance for Apache Cassandra. You can run rebuild on one or more nodes at the same time. Run on one node at a time to reduce the impact on the existing cluster. Run on multiple nodes when the cluster can handle the extra I/O and network pressure. For most installations, you can only run one or two in parallel to not overload the cluster.
Warning
You must specify the source data center when you run
nodetool rebuild
. If you provide the data center incorrectly on the first attempt, it results in token ranges being copied without data being copied for your nonsystem tables. Subsequent attempts fail even if you provide the data center correctly. You can resolve this issue by deleting entries for each nonsystem keyspace insystem.available_ranges
by using thecqlsh
query tool in your target Cassandra MI data center:delete from system.available_ranges where keyspace_name = 'myKeyspace';
Cut over your application code to point to the seed nodes in your new Azure Managed Instance for Apache Cassandra data centers.
Important
As also mentioned in the hybrid setup instructions, if the data centers in your existing cluster don't enforce client-to-node encryption (SSL), enable this feature in your application code, because Cassandra managed instance enforces this requirement.
Run ALTER KEYSPACE for each keyspace, in the same manner as done earlier, but now removing your old data centers.
Run node tool decommission for each old data center node.
Switch your application code back to quorum, if necessary or preferred.
Re-enable automatic repairs:
az managed-cassandra cluster update \ --resource-group $resourceGroupName \ --cluster-name $clusterName --repair-enabled true
Troubleshooting
If you encounter an error when you apply permissions to your virtual network using Azure CLI, you can apply the same permission manually from the Azure portal. An example of such an error is Can't find user or service principal in graph database for 'e5007d2c-4b13-4a74-9b6a-605d99f03501'. For more information, see Use Azure portal to add Azure Cosmos DB service principal.
Note
The Azure Cosmos DB role assignment is used for deployment purposes only. Azure Managed Instanced for Apache Cassandra has no backend dependencies on Azure Cosmos DB.
Clean up resources
If you're not going to continue to use this managed instance cluster, delete it with the following steps:
- From the left-hand menu of Azure portal, select Resource groups.
- From the list, select the resource group you created for this quickstart.
- On the resource group Overview pane, select Delete resource group.
- In the next window, enter the name of the resource group to delete, and then select Delete.