BasqueParl: Cracking the Rhetorical Imprint in the Basque Parliament Using Natural Language Processing
DOI:
https://doi.org/10.5294/pacla.2025.28.1.3Keywords:
Open government, open data, political rhetoric, natural language processing, machine learning, automated indexing, discourse analysisAbstract
This article focuses on the BasqueParl project, which uses machine learning and natural language processing (NLP) techniques to analyze political rhetoric in the Basque Parliament (Spain). It is a good example of political code-switching in the context of bilingualism and a quantitatively balanced political representation between men and women. The investigation builds on the documentary analysis of parliamentary records (2012 and 2020), comprising 13,872,105 words in Basque and Spanish. Variables such as date, speaker, birth year, sex, party, language, slogans, and entities are considered. To visualize the results, we present an experimental dashboard divided into four sections: Interventions, Tables, LDA, and Scattertext, using lemmatization and recognition tools for named entities, among others. The dashboard offers a graphic view of parliamentary activity, showing that 21 % of rhetorical production is in Basque. Other findings include that women speak less and that minority parties have a disproportionate discursive presence. Furthermore, there is a traditional genre-based distribution of political issues. We conclude that in the era of open government and open data, these tools are essential to promote transparency in public administration.
Downloads
References
Abercrombie, G. y Batista-Navarro, R. (2018). A sentiment-labelled corpus of Hansard parliamentary debate speeches. En Proceedings of ParlaCLARIN: Common Language Resources and Technology Infrastructure (CLARIN) (pp. 280-285). Association for Computational Linguistics. https://doi.org/10.18653/v1/W18-6241
Akbik, A., Bergmann, T. y Vollgraf, R. (2019). Pooled contextualized embeddings for named entity recognition. En Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (vol. 1, pp. 724-728). Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1078
Alcaide, E., Carranza, A. y Fuentes, C. (2016). Emotional argumentation in political discourse. En C. Fuentes y G. Álvarez (cords.), A gender-based approach to parliamentary discourse (pp. 129-159). John Benjamins Publishing Company. https://doi.org/10.1075/dapsac.68.08alc
Álvarez-Benito, G. y Íñigo-Mora, I. (2012). Repetición y reiteración en las preguntas orales del Parlamento Andaluz. Discurso & Sociedad, 6(1), 21-48. https://idus.us.es/server/api/core/bitstreams/0c492e74-15ca-4c32-8461-72802253a459/content
Baciero Fernández, J. I. (2020). Elaboración de un modelo de reconocimiento de “entidades nominales” (NER) para su uso en aplicaciones de procesamiento del lenguaje natural (NLP) [tesis de grado, Universidad Politécnica de Madrid]. https://oa.upm.es/62858/1/TFG_JOSE_IGNACIO_BACIERO_FERNANDEZ.pdf
Bara, J, Weale, A. y Bicquelet, A. (2007). Analysing parliamentary debate with computer assistance. Swiss Political Science Review, 13(4), 577-605. https://doi.org/10.1002/j.1662-6370.2007.tb00090.x
Braunschweig, K., Eberius, J., Thiele, M. y Lehner, W. (2012). The state of open data. Limits of Current Open Data Platforms, 1, 72-78. http://www2012.wwwconference.org/proceedings/nocompanion/wwwwebsci2012_braunschweig.pdf
Calderón, C. y Lorenzo, S. (coords.) (2010). Open government: Gobierno abierto. Algón. https://libros.metabiblioteca.org/server/api/core/bitstreams/240cefc0-76c0-4912-b499-6da26a351de0/content
Church, K. W. (1980). On memory limitations in natural language processing. Massachusetts Institute of Technology. https://dspace.mit.edu/bitstream/handle/1721.1/149526/MIT-LCS-TR-245.pdf?sequence=1&isAllowed=y
Escribano, N., González, J. A., Orbegozo-Terradillos, J., Larrondo-Ureta, A., Peña-Fernández, S., Perez-de-Viñaspre, O. y Agerri, R. (2022). Basqueparl: A bilingual corpus of basque parliamentary transcriptions. arXiv:2205.01506. https://doi.org/10.48550/arXiv.2205.01506
Gelfand, A. (2000). Gibbs sampling. Journal of the American statistical Association, 95(452), 1300-1304. https://doi.org/10.1080/01621459.2000.10474335
Gonçalvez, T. (2021, 9 de enero). PLN: ¿Qué es el procesamiento del lenguaje natural? Alura LATAM. https://www.aluracursos.com/blog/que-es-el-procesamiento-del-lenguaje-natural
Grijzenhout, S., Jijkoun, V. y Marx, M. (2010). Opinion mining in dutch hansards. En Proceedings of the Workshop From Text to Political Positions (pp. 1-15). Free University of Amsterdam. https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=a31415a5295e795ed30bec05f22d24502f40d967
Guix Oliver, J. (2008). El análisis de contenidos: ¿Qué nos están diciendo? Revista de Calidad Asistencial, 23(1), 26-30. https://doi.org/10.1016/S1134-282X(08)70464-0
Heller, M. (2020). Code-switching and the politics of language. En L. Wei (ed.), The bilingualism reader (pp. 163-176). Routledge. https://doi.org/10.4324/9781003060406-18
Ilie, C. (2016). Parliamentary discourse and deliberative rhetoric. En P. Ihalainen, C. Ilie y K. Palonen (eds.), Parliaments and parliamentarism: A comparative history of disputes about a European concept (pp. 133-145). Berghahn Books. https://doi.org/10.2307/j.ctvgs0b7n.13
Íñigo-Mora, I. (2007). Estrategias del discurso parlamentario. Discurso & Sociedad, 1(3), 400-438. https://doi.org/10.14198/dissoc.1.3.2
Kessler, J. (2017). Scattertext: A browser-based tool for visualizing how corpora differ. En Proceedings of ACL 2017, System Demonstrations (pp. 85-90). Association for Computational Linguistics. https://doi.org/10.18653/v1/P17-4015
Lyddy, E. D. (2001). Natural language processing. En Encyclopedia of Library and Information Science. Marcel Decker. https://surface.syr.edu/cgi/viewcontent.cgi?article=1043&context=istpub
Méndez, E. y Moreiro, J. A. (1999). Lenguaje natural e indización automatizada. Ciencias de la Información, 30(3), 11-24. http://eprints.rclis.org/12685/1/indizacion99.pdf
Mohit, B. (2014). Named entity recognition. En I. Zitouni (eds.), Natural language processing of semitic languages (pp. 221-245). Springer. https://doi.org/10.1007/978-3-642-45358-8_7
Orbegozo Terradillos, J., Iturbe Tolosa, A. y González Abrisketa, M. (2017). Análisis de la nueva estrategia comunicativa de EH Bildu (2016): Hacia una narrativa de la emoción. Anàlisi: quaderns de comunicació i cultura, 57, 97-114. https://doi.org/10.5565/rev/analisi.3111
Orbegozo Terradillos, J., Larrondo Ureta, A. y Landaburu Corchete, A. (2021). Emociones y discurso público: Una mirada de género a la retórica política afectiva. Cultura, Lenguaje y Representación, 26, 247-266. https://doi.org/10.6035/clr.5838
Pérez-Gordillo, B. y Valle-Cuevas, J. A. (2017). Portales de transparencia y ciudadanía: Análisis de utilidad y usabilidad del portal de transparencia del parlamento de Andalucía (2016-2017) [tesis de grado, Universidad de Sevilla]. https://idus.us.es/server/api/core/bitstreams/3833679e-312d-46c5-9fd1-ee6d85837887/content
Qaiser, S. y Ali, R. (2018). Text mining: Use of TF-IDF to examine the relevance of words to documents. International Journal of Computer Applications, 181(1), 25-29. https://doi.org/10.5120/ijca2018917395
Ramírez-Alujas, Á. (2012). Gobierno abierto es la respuesta: ¿Cuál era la pregunta? Más Poder Local, 12, 14-22. https://pad.undp.org.mx/files/g/820dcf0c1242364677545293.44594fd/banco/archivo/102/0/gobierno-abierto-es-la-respuesta-cual-era-la-pregunta.pdf
Rheault, L. y Cochrane, C. (2020). Word embeddings for the analysis of ideological placement in parliamentary corpora. Political Analysis, 28(1), 112-133. https://doi.org/10.1017/pan.2019.26
Rheault, L., Beelen, K., Cochrane, C. y Hirst, G. (2016). Measuring emotion in parliamentary debates with automated textual analysis. PLOS ONE, 11(12), e0168843. https://doi.org/10.1371/journal.pone.0168843
Robinson, J. (2012). Wittgenstein, sobre el lenguaje. Estudios, 10(102), 7-32. https://doi.org/10.5347/01856383.0102.000191959
Rodríguez Palchevich, D. (2008). Nuevas tecnologías web 2.0: Hacia una real democratización de la información y el conocimiento. http://biblioteca.udgvirtual.udg.mx/jspui/bitstream/123456789/3564/1/Nuevas_tecnolog%c3%adas_Web_2.0.pdf
Sáiz-Arnaiz, A. (1985). El Parlamento vasco: Relieve constitucional, organización y funcionamiento. Revista de Estudios Políticos, 46, 151-182. https://www.cepc.gob.es/sites/default/files/2022-01/16259repne046-047147.pdf
Salah, Z. (2014). Machine learning and sentiment analysis approaches for the analysis of Parliamentary debates [tesis de doctorado, University of Liverpool]. https://core.ac.uk/reader/80771780
Sarica, S. y Luo, J. (2021). Stopwords in technical language processing. PLOS ONE, 16(8), e0254937. https://doi.org/10.1371/journal.pone.0254937
Segovia, M. (2022, 26 de mayo ). El uso del euskera en la calle en Euskadi solo crece un 2% tras 30 años de impulso. El Independientes. https://www.elindependiente.com/espana/2022/05/26/el-uso-del-euskera-en-la-calle-en-euskadi-solo-crece-un-2-tras-30-anos-de-impulso/#:~:text=Los%20%C3%BAltimos%20sondeos%20p%C3%BAblicos%20cifran,castellano%20en%20su%20vida%20cotidiana.
Sievert, C. y Shirley, K. (2014). LDAvis: A method for visualizing and interpreting topics. En J. Chuang, S. Green, M. Hearst, J. Heer y P. Koehn (eds.), Proceedings of the workshop on interactive language learning, visualization, and interfaces (pp. 63-70). https://doi.org/10.3115/v1/W14-3110
Vázquez Garcia, M. (2014). El futuro de las herramientas de procesamiento del lenguaje. COMeIN, 29. https://doi.org/10.7238/c.n29.1405
Additional Files
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Julen Orbegozo-Terradillos, Ainara Larrondo-Ureta, Nayla Escribano, Simón Peña Fernández, Rodrigo Agerri

This work is licensed under a Creative Commons Attribution 4.0 International License.
1. Proposed Policy for Journals That Offer Open Access
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Funding data
-
Ministerio de Ciencia, Innovación y Universidades
Grant numbers PID2020-114193RB-I00;PID2020-114584GB-I00
