Fine-Tuning Model Stuck: Job Enqueued, Waiting for Prior Jobs to Complete

Question

Fine-Tuning Model Stuck: Job Enqueued, Waiting for Prior Jobs to Complete

Billy Zhou 0

When I use the Python SDK provided by Azure to fine-tune a GPT-4O-mini model, the progress is stuck at "Job enqueued. Waiting for the jobs ahead to complete" (for over 12 hours). Could you please help check it?

User's image

Sample python code:

from openai import AzureOpenAI
client = AzureOpenAI(
    azure_endpoint = "https://xxxxx.azure.com/",
    api_key = "xxxxx",
    api_version= "2024-12-01-preview",
)

def start_fine_tuning():
    response = client.fine_tuning.jobs.create(
        training_file = "file-f33a9fb8ef214edfaa94aa6d9a707f48xx",
        validation_file = "file-a3cf2c8e28304bea8f2edd053f8e8014xx",
        model = "gpt-4o-mini", # Enter base model name. Note that in Azure OpenAI the model name contains dashes and cannot contain dot/period characters.
        hyperparameters={
            "n_epochs":2
        },
        seed = 105 # seed parameter controls reproducibility of the fine-tuning job. If no seed is specified one will be generated automatically.
    )
    # You can use the job ID to monitor the status of the fine-tuning job.
    # The fine-tuning job will take some time to start and complete.

    print("Job ID:", response.id)
    print("Status:", response.status)
    print(response.model_dump_json(indent=2))

1 answer

Your answer

Answer 1

Hi Billy Zhou,

I understand that your fine-tuning job for the GPT-4o-mini model has been in the "Job enqueued. Waiting for the jobs ahead to complete" state for over 12 hours. This status typically indicates that the job is waiting in a queue due to high demand for fine-tuning resources. Azure OpenAI fine-tuning jobs are processed using shared compute infrastructure, and at times especially with popular models like GPT-4o-mini jobs may experience longer queue times than expected.

While 12 hours is within the range of expected queuing delays during peak periods, we recognize that this can be inconvenient. We recommend monitoring the job in the Fine-tuning section of Azure AI Studio and refreshing the portal periodically to check for status updates. If there’s no progress after a longer wait (e.g., 24 hours), you may consider canceling the current job and resubmitting it, which can sometimes resolve hidden queuing or scheduling issues.

In parallel, we recommend checking the Azure Status Page for any ongoing service disruptions or regional capacity issues that might be affecting your fine-tuning job

Reference thread: https://learn.microsoft.com/en-us/answers/questions/2260568/fine-tuning-job-stuck-in-training-status-for-over

Also, you can refer Check the status of your custom model,

Troubleshooting for Azure OpenAI fine-tuning.

I Hope this helps. Do let me know if you have any further queries.

Thank you!

Share via

Fine-Tuning Model Stuck: Job Enqueued, Waiting for Prior Jobs to Complete

1 answer

Your answer