Opendatabay APP

Portuguese-Brazilian Personal Name Archive

Census & Demographics

Tags and Keywords

Brazil

Names

Portuguese

Linguistics

Text

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Portuguese-Brazilian Personal Name Archive Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Providing a vast registry of first names from Brazil as recorded by the IBGE (Brazilian Institute of Geography and Statistics), this collection reflects the linguistic and cultural diversity of the Portuguese-speaking world. The records include a wide array of name variations, capturing the unique naming conventions found across various regions of the country. Its primary significance lies in its utility for natural language processing, particularly in the masking of personal identifiers and the study of Brazilian onomastics.

Columns

  • Index: A numerical identifier assigned to each unique entry in the list.
  • nomes-pt-br: The specific first name or variation recorded, presented in alphabetical order.

Distribution

The data is delivered in a single CSV file titled all-pt-br-names.csv, which has a file size of approximately 1.44 MB. It contains 103,453 unique records, demonstrating a 100% validity rate with no missing or mismatched entries. The resource has been awarded a usability score of 10.00 and is maintained as a static archive with no future updates planned.

Usage

This resource is ideal for de-identifying sensitive text by providing a reference list to detect and redact personal names in Portuguese documents. It is well-suited for training machine learning models in entity recognition or developing autocomplete features for localized applications. Additionally, researchers can use the alphabetical list to conduct linguistic analysis or to populate synthetic datasets for testing software in a Brazilian context.

Coverage

The geographic scope is limited to Brazil, while the demographic scope encompasses the diverse population recorded by the IBGE. The collection includes name variations used throughout the country, ensuring that regional naming trends are represented. The data is based on census-style collection methods, providing a broad view of the names currently in use across the nation.

License

CC BY-SA 4.0

Who Can Use It

Data scientists can leverage these records to create robust anonymisation tools that comply with privacy regulations like the LGPD. Linguists may utilise the entries to study the evolution and frequency of Portuguese names in South America. Furthermore, software developers and app creators can use the registry to ensure their products are culturally relevant and inclusive for the Brazilian market.

Dataset Name Suggestions

  • Brazilian First Names: IBGE National Registry
  • Portuguese-Brazilian Personal Name Archive
  • Brazil Name Registry for Data De-identification
  • IBGE Brazilian Names: Alphabetical Registry
  • Portuguese Linguistic Assets: First Names of Brazil

Attributes

Listing Stats

VIEWS

2

DOWNLOADS

0

LISTED

27/12/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format