PII Can not detect for Spanish country

Mohsin Khan 60 Reputation points
2025-05-19T08:23:00.4766667+00:00

Hey,

Greetings

PIIs can not detected for spanish if it is in paragraph but able to detect as single line. Even though selected spanish as the langugae.

Able to detected

Número de identificación fiscal – NIF (Spain) 12345678Z

Not able to detect correctly as ESDNI

Carlos Martínez, un consultor freelance con residencia en Madrid, España, actualizó recientemente su registro personal y empresarial ante las autoridades españolas. Como parte del proceso, proporcionó su Documento Nacional de Identidad (DNI) 12345678Z,

Azure AI Language
Azure AI Language
An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.
515 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Leo Tran 5 Reputation points Independent Advisor
    2025-05-20T09:57:01.5133333+00:00

    Hi Mohsin Khan,

    Thank you for contacting Q&A Forum. I would like to provide my findings and proposed solution:

    The Azure AI Language PII detection Preview API is still under development and may undergo changes. It is not recommended for production use. Customers should wait for the General Availability (GA) release, which will offer stable features and full support. For more information on the PII detection feature and its current capabilities, you can refer to the official documentation:

    Government and country/region-specific identification

    Azure AI Language's PII detection service supports Spanish and can identify entities like Spain's Documento Nacional de Identidad (DNI) when presented in isolation. However, its accuracy diminishes when such identifiers are embedded within longer, unstructured text.

    This limitation arises because the underlying models may not effectively recognize certain PII types in complex contexts. To enhance detection, it's advisable to specify the language explicitly in your API requests, adjust confidence score thresholds, preprocess text to isolate potential PII elements, or supplement with custom logic like regular expressions tailored to Spanish identification formats

    Kindly let me know if this work for you and please let me know if you have any further question.

    If I have answered your question, please accept this as answer as a token of appreciation and don't forget to thumbs up for "Was it helpful"!

    Best regards,


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.