BasqueParl: Cracking the Rhetorical Imprint in the Basque Parliament Using Natural Language Processing

Authors

DOI:

https://doi.org/10.5294/pacla.2025.28.1.3

Keywords:

Open government, open data, political rhetoric, natural language processing, machine learning, automated indexing, discourse analysis

Abstract

This article focuses on the BasqueParl project, which uses machine learning and natural language processing (NLP) techniques to analyze political rhetoric in the Basque Parliament (Spain). It is a good example of political code-switching in the context of bilingualism and a quantitatively balanced political representation between men and women. The investigation builds on the documentary analysis of parliamentary records (2012 and 2020), comprising 13,872,105 words in Basque and Spanish. Variables such as date, speaker, birth year, sex, party, language, slogans, and entities are considered. To visualize the results, we present an experimental dashboard divided into four sections: Interventions, Tables, LDA, and Scattertext, using lemmatization and recognition tools for named entities, among others. The dashboard offers a graphic view of parliamentary activity, showing that 21 % of rhetorical production is in Basque. Other findings include that women speak less and that minority parties have a disproportionate discursive presence. Furthermore, there is a traditional genre-based distribution of political issues. We conclude that in the era of open government and open data, these tools are essential to promote transparency in public administration.

Downloads

Download data is not yet available.

References

Abercrombie, G. y Batista-Navarro, R. (2018). A sentiment-labelled corpus of Hansard parliamentary debate speeches. En Proceedings of ParlaCLARIN: Common Language Resources and Technology Infrastructure (CLARIN) (pp. 280-285). Association for Computational Linguistics. https://doi.org/10.18653/v1/W18-6241

Akbik, A., Bergmann, T. y Vollgraf, R. (2019). Pooled contextualized embeddings for named entity recognition. En Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (vol. 1, pp. 724-728). Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1078

Alcaide, E., Carranza, A. y Fuentes, C. (2016). Emotional argumentation in political discourse. En C. Fuentes y G. Álvarez (cords.), A gender-based approach to parliamentary discourse (pp. 129-159). John Benjamins Publishing Company. https://doi.org/10.1075/dapsac.68.08alc

Álvarez-Benito, G. y Íñigo-Mora, I. (2012). Repetición y reiteración en las preguntas orales del Parlamento Andaluz. Discurso & Sociedad, 6(1), 21-48. https://idus.us.es/server/api/core/bitstreams/0c492e74-15ca-4c32-8461-72802253a459/content

Baciero Fernández, J. I. (2020). Elaboración de un modelo de reconocimiento de “entidades nominales” (NER) para su uso en aplicaciones de procesamiento del lenguaje natural (NLP) [tesis de grado, Universidad Politécnica de Madrid]. https://oa.upm.es/62858/1/TFG_JOSE_IGNACIO_BACIERO_FERNANDEZ.pdf

Bara, J, Weale, A. y Bicquelet, A. (2007). Analysing parliamentary debate with computer assistance. Swiss Political Science Review, 13(4), 577-605. https://doi.org/10.1002/j.1662-6370.2007.tb00090.x

Braunschweig, K., Eberius, J., Thiele, M. y Lehner, W. (2012). The state of open data. Limits of Current Open Data Platforms, 1, 72-78. http://www2012.wwwconference.org/proceedings/nocompanion/wwwwebsci2012_braunschweig.pdf

Calderón, C. y Lorenzo, S. (coords.) (2010). Open government: Gobierno abierto. Algón. https://libros.metabiblioteca.org/server/api/core/bitstreams/240cefc0-76c0-4912-b499-6da26a351de0/content

Church, K. W. (1980). On memory limitations in natural language processing. Massachusetts Institute of Technology. https://dspace.mit.edu/bitstream/handle/1721.1/149526/MIT-LCS-TR-245.pdf?sequence=1&isAllowed=y

Escribano, N., González, J. A., Orbegozo-Terradillos, J., Larrondo-Ureta, A., Peña-Fernández, S., Perez-de-Viñaspre, O. y Agerri, R. (2022). Basqueparl: A bilingual corpus of basque parliamentary transcriptions. arXiv:2205.01506. https://doi.org/10.48550/arXiv.2205.01506

Gelfand, A. (2000). Gibbs sampling. Journal of the American statistical Association, 95(452), 1300-1304. https://doi.org/10.1080/01621459.2000.10474335

Gonçalvez, T. (2021, 9 de enero). PLN: ¿Qué es el procesamiento del lenguaje natural? Alura LATAM. https://www.aluracursos.com/blog/que-es-el-procesamiento-del-lenguaje-natural

Grijzenhout, S., Jijkoun, V. y Marx, M. (2010). Opinion mining in dutch hansards. En Proceedings of the Workshop From Text to Political Positions (pp. 1-15). Free University of Amsterdam. https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=a31415a5295e795ed30bec05f22d24502f40d967

Guix Oliver, J. (2008). El análisis de contenidos: ¿Qué nos están diciendo? Revista de Calidad Asistencial, 23(1), 26-30. https://doi.org/10.1016/S1134-282X(08)70464-0

Heller, M. (2020). Code-switching and the politics of language. En L. Wei (ed.), The bilingualism reader (pp. 163-176). Routledge. https://doi.org/10.4324/9781003060406-18

Ilie, C. (2016). Parliamentary discourse and deliberative rhetoric. En P. Ihalainen, C. Ilie y K. Palonen (eds.), Parliaments and parliamentarism: A comparative history of disputes about a European concept (pp. 133-145). Berghahn Books. https://doi.org/10.2307/j.ctvgs0b7n.13

Íñigo-Mora, I. (2007). Estrategias del discurso parlamentario. Discurso & Sociedad, 1(3), 400-438. https://doi.org/10.14198/dissoc.1.3.2

Kessler, J. (2017). Scattertext: A browser-based tool for visualizing how corpora differ. En Proceedings of ACL 2017, System Demonstrations (pp. 85-90). Association for Computational Linguistics. https://doi.org/10.18653/v1/P17-4015

Lyddy, E. D. (2001). Natural language processing. En Encyclopedia of Library and Information Science. Marcel Decker. https://surface.syr.edu/cgi/viewcontent.cgi?article=1043&context=istpub

Méndez, E. y Moreiro, J. A. (1999). Lenguaje natural e indización automatizada. Ciencias de la Información, 30(3), 11-24. http://eprints.rclis.org/12685/1/indizacion99.pdf

Mohit, B. (2014). Named entity recognition. En I. Zitouni (eds.), Natural language processing of semitic languages (pp. 221-245). Springer. https://doi.org/10.1007/978-3-642-45358-8_7

Orbegozo Terradillos, J., Iturbe Tolosa, A. y González Abrisketa, M. (2017). Análisis de la nueva estrategia comunicativa de EH Bildu (2016): Hacia una narrativa de la emoción. Anàlisi: quaderns de comunicació i cultura, 57, 97-114. https://doi.org/10.5565/rev/analisi.3111

Orbegozo Terradillos, J., Larrondo Ureta, A. y Landaburu Corchete, A. (2021). Emociones y discurso público: Una mirada de género a la retórica política afectiva. Cultura, Lenguaje y Representación, 26, 247-266. https://doi.org/10.6035/clr.5838

Pérez-Gordillo, B. y Valle-Cuevas, J. A. (2017). Portales de transparencia y ciudadanía: Análisis de utilidad y usabilidad del portal de transparencia del parlamento de Andalucía (2016-2017) [tesis de grado, Universidad de Sevilla]. https://idus.us.es/server/api/core/bitstreams/3833679e-312d-46c5-9fd1-ee6d85837887/content

Qaiser, S. y Ali, R. (2018). Text mining: Use of TF-IDF to examine the relevance of words to documents. International Journal of Computer Applications, 181(1), 25-29. https://doi.org/10.5120/ijca2018917395

Ramírez-Alujas, Á. (2012). Gobierno abierto es la respuesta: ¿Cuál era la pregunta? Más Poder Local, 12, 14-22. https://pad.undp.org.mx/files/g/820dcf0c1242364677545293.44594fd/banco/archivo/102/0/gobierno-abierto-es-la-respuesta-cual-era-la-pregunta.pdf

Rheault, L. y Cochrane, C. (2020). Word embeddings for the analysis of ideological placement in parliamentary corpora. Political Analysis, 28(1), 112-133. https://doi.org/10.1017/pan.2019.26

Rheault, L., Beelen, K., Cochrane, C. y Hirst, G. (2016). Measuring emotion in parliamentary debates with automated textual analysis. PLOS ONE, 11(12), e0168843. https://doi.org/10.1371/journal.pone.0168843

Robinson, J. (2012). Wittgenstein, sobre el lenguaje. Estudios, 10(102), 7-32. https://doi.org/10.5347/01856383.0102.000191959

Rodríguez Palchevich, D. (2008). Nuevas tecnologías web 2.0: Hacia una real democratización de la información y el conocimiento. http://biblioteca.udgvirtual.udg.mx/jspui/bitstream/123456789/3564/1/Nuevas_tecnolog%c3%adas_Web_2.0.pdf

Sáiz-Arnaiz, A. (1985). El Parlamento vasco: Relieve constitucional, organización y funcionamiento. Revista de Estudios Políticos, 46, 151-182. https://www.cepc.gob.es/sites/default/files/2022-01/16259repne046-047147.pdf

Salah, Z. (2014). Machine learning and sentiment analysis approaches for the analysis of Parliamentary debates [tesis de doctorado, University of Liverpool]. https://core.ac.uk/reader/80771780

Sarica, S. y Luo, J. (2021). Stopwords in technical language processing. PLOS ONE, 16(8), e0254937. https://doi.org/10.1371/journal.pone.0254937

Segovia, M. (2022, 26 de mayo ). El uso del euskera en la calle en Euskadi solo crece un 2% tras 30 años de impulso. El Independientes. https://www.elindependiente.com/espana/2022/05/26/el-uso-del-euskera-en-la-calle-en-euskadi-solo-crece-un-2-tras-30-anos-de-impulso/#:~:text=Los%20%C3%BAltimos%20sondeos%20p%C3%BAblicos%20cifran,castellano%20en%20su%20vida%20cotidiana.

Sievert, C. y Shirley, K. (2014). LDAvis: A method for visualizing and interpreting topics. En J. Chuang, S. Green, M. Hearst, J. Heer y P. Koehn (eds.), Proceedings of the workshop on interactive language learning, visualization, and interfaces (pp. 63-70). https://doi.org/10.3115/v1/W14-3110

Vázquez Garcia, M. (2014). El futuro de las herramientas de procesamiento del lenguaje. COMeIN, 29. https://doi.org/10.7238/c.n29.1405

Published

2025-05-07

How to Cite

Orbegozo-Terradillos, J., Larrondo-Ureta, A., Escribano, N., Peña Fernández, S., & Agerri, R. (2025). BasqueParl: Cracking the Rhetorical Imprint in the Basque Parliament Using Natural Language Processing. Palabra Clave, 28(1), e2813. https://doi.org/10.5294/pacla.2025.28.1.3

Issue

Section

Articles

Funding data