Artificial Intelligence and Natural Language Processing

Artificial Intelligence (AI) is a branch of computer science that seeks to develop systems capable of mimicking and enhancing human abilities such as reasoning, learning, and perception. One of its most fascinating subareas is Natural Language Processing (NLP), dedicated to understanding and interpreting human language, allowing machines and programs to interact with people more efficiently and intuitively. NLP encompasses everything from syntactic and semantic analysis of words to the generation of coherent and meaningful text, enabling the creation of virtual assistants, automatic translation systems, and other applications that facilitate communication and information access in our daily lives.

About the Area Coordinator

Prof. Sergio Antônio Andrade de Freitas

Professor and researcher in the field of Artificial Intelligence (AI) and Natural Language Processing (NLP), he holds a tenure position at the University of Brasília - UnB. Some of his contributions to the field of NLP include the automated interpretation of anaphoras, a pioneering study that deepened the understanding of how machines can process and interpret human text, contributing to the resolution of anaphoras and advancements in computational text comprehension. Moreover, his research in AI explores the integration of advanced computational methods to develop more intelligent and intuitive systems, enhancing human-computer interaction and promoting technological innovations in the field. Currently, he is a faculty member of the Software Engineering course and an active member of the Graduate Program in Applied Computing (PPCA).
More About the Coordinator

Research Team

Master’s degree

Bachelor’s Thesis

Previous Researchers

Master’s degree

  1. Edmilson Cosme da Silva, Prediction of academic dropout in higher education - the case of face-to-face undergraduate courses at the University of Brasília . Master's Thesis in Applied Computing, University of Brasília (Brazil), 2023 Advisor(s): Sergio Freitas . Tags: Learning Analytics, Machine Learning .
    Researchers have been investigating attrition in higher education, identifying two categories - students who leave the university and those who abandon higher education altogether, negatively affecting institutions, students, and society. Since 1995, in Brazil, the creation of ANDIFES has led to more frequent studies on graduation, retention, and attrition in Brazilian universities, focusing on institutional attrition, characterized by a student's departure from their original course. The University of Brasília (UnB) has implemented strategies to increase student retention in undergraduate courses. This work aimed to develop and test an analysis model to predict attrition in face-to-face courses, through a Systematic Literature Review to identify impact factors and define indicators extracted from UnB's academic systems. The model, named MAGRA, combines indicators with machine learning algorithms to identify students at risk of dropping out. Tests conducted at the Faculdade do Gama (FGA) and UnB indicated that enrollment frequency in subjects could predict completion difficulties. The research suggests that to improve the early identification of at-risk students, adjustments in feedback mechanisms, the inclusion of new systems, improvement in data quality, and adjustments in algorithm parameters are necessary.
  2. Arthur Rocha Temporim de Lacerda, Gamified Chatbot Management Process: A way to build gamified chatbots . Master's Thesis in Applied Computing, University of Brasília (Brazil), 2024 Advisor(s): Sergio Freitas . Tags: Gamification, Machine Learning .
    Chatbot development frameworks provide a range of construction methods; however, established processes like the Chatbot Management Process (CMP) often lack activities specifically tailored to enhance user engagement. This thesis presents the Gamified Chatbot Management Process (GCMP), an extension of the CMP that integrates and adapts activities to boost user interaction with chatbots. Three iterations of the GCMP were created, each incorporating enhancements driven by the Goal-Question-Metric (GQM) methodology. This iterative approach supported the continuous evaluation and evolution of the process. Experiments with real users indicated positive engagement, with all participants successfully meeting the set objectives. Furthermore, the average deployment time was reduced by 66% from the initial to the final version. User assessments also gave top ratings for the quality of responses generated by the chatbot. These results demonstrate the effectiveness of the proposed GCMP. The findings suggest a strong positive correlation between implementing the GCMP and improvements in developing gamified chatbots. The advances in both chatbot functionality and gamification techniques highlight the potential for widespread adoption of the GCMP as a reliable and effective approach for creating gamified chatbots.
  3. Débora Zupeli Bossois, Text categorization methodology from unlabeled documents using an anaphora resolution process . Master's Thesis in Informatics, Federal University of Espirito Santo (Brazil), 2010 Advisor(s): Sergio Freitas . Tags: Natural Language Processing, Artificial Intelligence, Machine Learning .
    With the ongoing expansion of electronic textual content, there's a need to organize all this information in a manageable way. Hence, the process of text categorization was developed to ease the manipulation and retrieval of information by dividing it into thematic categories. There are various approaches to achieving an automatic text categorizer, among which the supervised paradigm is the most traditional. Although supervised methodology shows precision comparable to human experts, the requirement for a pre-classified corpus can be a limiting factor in some applications. In these cases, a semi or unsupervised solution, which doesn't demand a complete and well-formed training set for categorizer construction, can be applied; instead, unlabeled documents are provided to the method. Both supervised learning paradigms and semi or unsupervised paradigms typically build a representation of texts based solely on term occurrence, not considering semantic factors. However, many intrinsic characteristics of natural language can make the process ambiguous, one of which is the use of various terms to refer to an entity already mentioned in the text, a linguistic phenomenon known as anaphora. This dissertation proposes a method for the conception of an unsupervised categorizer, utilizing the Nominal Structure of Discourse (NSD), developed by Freitas for anaphora resolution, as a foundation. For this purpose, a bootstrapping technique for categorization is implemented, aiming to obtain initial labeling for documents, which is used to generate a categorization model through the supervised paradigm. Besides being based on the NSD, the methodology of this work benefits directly from the anaphora resolution process, using the identified antecedents for anaphoras during the final categorization phase. This work presents details about the proposed methodology, explaining the developed algorithms, as well as the experiments conducted for method evaluation. Results show that the use of the anaphora resolution process is beneficial for an unsupervised categorization system.
  4. Luana Vieira Morellato, Computational methodology for identification of nominal phrases in Portuguese . Master's Thesis in Informatics, Federal University of Espirito Santo (Brazil), 2010 Advisor(s): Sergio Freitas . Tags: Natural Language Processing, Artificial Intelligence .
    Phrases, or syntagms, are units of meaning with syntactic functions within a sentence, as described by Nicola (2008). Generally, the sentences in any statement express content through the elements and combinations thereof provided by the language. This process forms sets and subsets that act as syntactic units within the larger unit of the sentence—syntagms, which can be categorized into nominal and verbal. Among these, nominal syntagms are of particular interest due to their higher semantic value. They are employed in Natural Language Processing (NLP) tasks such as anaphora resolution, automatic ontology building, parsing in medical texts for summary generation and vocabulary creation, or as an initial step in syntactic analysis processes. In Information Retrieval (IR), syntagms can be used to create terms in document indexing and search systems, improving results. This dissertation presents a computational methodology for identifying nominal syntagms in Portuguese language digital documents. It outlines the methodology for identifying and extracting nominal syntagms through the development of SISNOP—System for Identifying Nominal Syntagms in Portuguese. SISNOP, comprising a set of modules and programs, interprets unrestricted texts in natural language through morphological and syntactic analyses to retrieve nominal syntagms, also providing syntactic information such as gender, number, and degree of words within the extracted syntagms. Tested on corpora like CETENFolha and CETEMPúblico, SISNOP recognized 98.12% and 94.59% of sentences, identifying over 24 million syntagms. Its modules—Morphological Tagger, Nominal Syntagms Identifier, and Gender, Number, and Degree Identifier—were individually tested using a smaller dataset due to manual result analysis, achieving a precision of 82.45% and coverage of 69.20%.
  5. Francisco Santiago do Carmo Pereira, A methodology for the use of natural language processing in the search for information in digital documents . Master's Thesis in Informatics, Federal University of Espirito Santo (Brazil), 2009 Advisor(s): Sergio Freitas . Tags: Natural Language Processing, Artificial Intelligence, Machine Learning .
    This dissertation presents a novel digital text search methodology grounded in the Nominal Structure of Discourse, expanding on anaphora resolution techniques proposed by Freitas [2005]. It employs these techniques to reveal the textual structure designed by authors, offering a unique approach to Information Retrieval (IR). Traditional IR models, like the vector space model and Latent Semantic Indexing, primarily use document terms for representation and search, often failing to account for the nuances of natural language, such as the use of anaphoras. These linguistic features can diminish the representational efficacy of classical models by complicating the identification of key entities within texts. To address these limitations, an alternative structural model that incorporates anaphora resolution into its computational representation of documents is introduced, based on Seibel Júnior's work [Seibel Júnior and Freitas, 2007]. This model, which focuses on the Discourse Nominal Structure for Searches (DNSS), aims to improve upon existing IR methodologies by ensuring a more nuanced and effective representation of documents. It does so by identifying and utilizing the central elements, or focuses, of text sentences, while also considering additional information provided by the Nominal Structure of Discourse (NSD). The dissertation elaborates on the development of this search structure, detailing the anaphora resolution process and its impact on enhancing the representation and search result quality of documents. Through the exploration of algorithms and experimentation, the study underscores the potential of this new methodology in advancing the field of Information Retrieval.
  6. Hilário Seibel Júnior, Retrieval of relevant information in digital documents based on anaphora resolution . Master's Thesis in Informatics, Federal University of Espirito Santo (Brazil), 2007 Advisor(s): Sergio Freitas . Tags: Natural Language Processing, Artificial Intelligence .
    Traditional Information Retrieval (IR) methods are primarily based on counting the frequency of word occurrences in a document, without offering solutions for interpreting the semantic content of the discourse [Van Rijsbergen 1979; Baeza-Yates and Ribeiro-Neto 1999]. By not interpreting the analyzed document, such methods may overlook important information about it. A solution to overcome this issue, mentioned in [Salton and McGill 1986], is to use Natural Language Processing (NLP) in information retrieval. One application of NLP is the processing of anaphoras. Anaphora [Carter 1987; Beaver 2004] is a linguistic phenomenon where an entity introduced a priori is referenced later in another sentence through some linguistic expression, as in "Valentina was born in São Paulo. The girl is a Pisces." Anaphora resolution identifies that the term "girl" in the second sentence references the entity introduced in the discourse by the term "Valentina" from the first sentence. This allows us to assert that Valentina is more relevant to the text than if such reference had not occurred. Freitas proposes in [Freitas 2005] a method to resolve anaphoras in a document by creating a structure that allows tracking entities that remain prominent throughout the discourse. This structure stores information that can be leveraged by an information retrieval method. This dissertation proposes a computational methodology to retrieve relevant information from the resolution of anaphoras in a document, aiming to improve the quality of query results. Anaphora resolution enables precise identification of the number of times each entity is referenced in a discourse, revealing entities and connections that may have been obscured in the original discourse. This information makes it possible to decide whether a certain entity is more relevant than another in the document, focusing more on what the author wrote. Thus, the relevant documents retrieved are ranked by the amount of information they present regarding the searched terms, and not merely by the location and/or number of occurrences of such terms. This work also allows identifying, through the structure generated by anaphora processing, synonymous terms (those that reference the same entity). If the document indicates that two terms are synonymous, searching for one will return the same result as searching for the other, further increasing the quality of query results. This work presents the details of the proposed methodology - the measures used to calculate the relevance of a term in relation to the document interpreted through anaphora processing, the procedures necessary for conducting a query, the implemented prototype, and the analysis of its time complexity. Furthermore, the characteristics of this approach that differentiate it from traditional methods regarding the quality of the obtained results are evaluated.
  7. Ayrton Monteiro Cristo Filho, Computational interpretation of the simple future perfect tense in narratives for Brazilian Portuguese . Master's Thesis in Informatics, Federal University of Espirito Santo (Brazil), 2004 Advisor(s): Berilhes Borges Garcia, Sergio Freitas . Tags: Natural Language Processing, Artificial Intelligence .

Undergraduate Research

Bachelor’s Thesis

  1. Ícaro Oliveira Augusto Silva, Usability Information Extraction from App Comments on the Play Store . Senior Project (Bachelor of Software Engineering) - University of Brasília (Brazil), 2023 Advisor(s): Cristiane Ramos . Tags: Natural Language Processing, Software Quality .
  2. Luiz Henrique Fernandes Zamprogno, Gamified Chatbot for Academic Information Inquiry . Senior Project (Bachelor of Software Engineering) - University of Brasília (Brazil), 2022 Advisor(s): Sergio Freitas . Tags: Gamification, Artificial Intelligence .
  3. Victor Eduardo Araujo Ribeiro, Gamified Chatbot for Academic Information Inquiry . Senior Project (Bachelor of Software Engineering) - University of Brasília (Brazil), 2022 Advisor(s): Sergio Freitas . Tags: Gamification, Artificial Intelligence .
  4. Yeltsin Suares Gama, Learning Support System. . Senior Project (Bachelor of Software Engineering) - University of Brasília (Brazil), 2017 Advisor(s): Sergio Freitas . Tags: Learning Analytics, Artificial Intelligence .
  5. Cristóvão de Lima Frinhani, Application of Natural Language Processing - A Support Tool for Correcting Essay Questions . Senior Project (Bachelor of Software Engineering) - University of Brasília (Brazil), 2016 Advisor(s): Sergio Freitas . Tags: Natural Language Processing, Artificial Intelligence, Machine Learning .
    In the context of distance education (DE), which has been experiencing global expansion, one of the most significant challenges is the scalability of evaluating essay questions. This challenge impacts the efficiency of tutors and the quality of feedback provided to students, as grading such questions requires considerable time. To address this issue, a support system for pre-evaluation was developed, utilizing natural language processing (NLP) and machine learning (ML) to automate the assignment of preliminary grades to student responses. The proposed system is divided into two main parts - the first applies semantic similarity algorithms to compare student responses with provided answer keys; the second part is a web interface that facilitates the registration of questions and answer keys, submission of responses by students, and allows the tutor to evaluate the responses. The accuracy of the system is continuously improved through tutor feedback, enabling ML to adjust the evaluation criteria as needed. The system was tested in the course on the fundamentals of computer architecture, where it demonstrated the ability to assign meaningful grades to student responses and revealed an improvement in the accuracy of evaluations based on tutor feedback. It is concluded that the pre-evaluation support system offers a promising solution to the challenges faced in evaluating essay questions in DE environments, enhancing the scalability of the grading process and the quality of feedback provided, with potential for application across various fields of knowledge.
  6. Igor Ramos, Proposal of a Methodology for Map Creation in Systematic Mapping Studies Based on Knowledge Related to Ontologies . Senior Project (Bachelor of Software Engineering) - University of Brasília (Brazil), 2015 Advisor(s): Fabiana Mendes . Tags: Digital Transformation, Artificial Intelligence .
  7. Greg Ouyama Martins, Evaluation of algorithms for sorting digital documents retrieved in search . Senior Project (Bachelor of Software Engineering) - University of Brasília (Brazil), 2013 Advisor(s): Sergio Freitas . Tags: Artificial Intelligence, Natural Language Processing .
    The quest for information originated from libraries, where it was possible to retrieve necessary data through consultations using catalog cards, organizing books by title, author, year, or publisher. With technological advancements, this process was automated, allowing these tasks to be performed on computers. However, due to the vast volume of available information, efficiently finding specific information can be challenging, making the search process exhaustive and complex. To address this issue, there are studies and implementations concerning the sorting of retrieved information. It is also relevant to apply techniques for customized queries based on criteria defined by users. This work aims to analyze the effectiveness of dynamic and static algorithms in sorting retrieved information, exploring the customization of queries in open-source search engines. We will evaluate which algorithm offers greater accuracy, using precision x recall metrics, which allow for a degree of customization in queries, including user profiles from the four engineering courses at the University of Brasília - Faculdade do Gama, and how software engineering can enhance the sorting and retrieval of information.
  8. Luis Bruno Fidelis Gomes, Business Intelligence Applied to Health: Monitoring Dashboards for Health Professionals in the Psychosocial Care Network . Bachelor of Software Engineering - University of Brasília (Brazil), 2023 Advisor(s): George Marsicano . Tags: Artificial Intelligence .

Current Projects

Publications and Productions

Publications (30)

  1. Éber Júnio Borges Moreira,Sergio Antônio Andrade Freitas, A CP-SAT Approach for Academic Resource Timetabling in Higher Education Institutions: A Case Study at a Major Public University , in IEEE International Conference on IT in Higher Education and Training (ITHET 2024) , pTo appear, 2024 . Tags: Machine Learning, Learning Analytics .
  2. LACERDA, A. R. T.,FREITAS, S. A. A.,RAMOS, C. S., Gamified Chatbot Management Process: A way to build gamified chatbots , in 10th Intelligent Systems Conference , 2024 . DOI: 10.1007/978-3-031-66428-1_2 . Tags: Gamification, Machine Learning .
    This paper proposes the incorporation of gamification with machine learning for the development of chatbots. The Gamified Chatbot Development Process (GCMP), is a process for the development of gamified chatbots, it comprises eight activities, arranged into four steps, emphasizing gamification implementation. This process includes gamification planning, gamification management, updating chatbot content, chatbot behavior implementation, chatbot behavior validation, chatbot behavior analysis, chatbot delivery, and chatbot usage analysis. GCMP provides a clear and structured guide, allowing flexibility to accommodate each project’s specific requirements. This article describes the methodology employed, which includes the application of an experiment with software engineering students. The experiment is conducted by providing documents, holding weekly meetings, and collecting pertinent data. The applicability of GCMP in gamified chatbot projects is examined, and a new version of the process is proposed to resolve the gaps found, and conclusions are drawn based on experiments.
  3. SILVA, E. C.,FREITAS, S. A. A.,RAMOS, C. S.,MENEZES, A. E. M.,ARAUJO, L. K. S. R., A systematic review of the factors that impact the prediction of retention and dropout in higher education , in 56th Hawaii International Conference on System Science , 2023 . Tags: Learning Analytics, Machine Learning .
  4. Edmilson Cosme da Silva, Prediction of academic dropout in higher education - the case of face-to-face undergraduate courses at the University of Brasília . Master's Thesis in Applied Computing, University of Brasília (Brazil), 2023 Advisor(s): Sergio Freitas . Tags: Learning Analytics, Machine Learning .
    Researchers have been investigating attrition in higher education, identifying two categories - students who leave the university and those who abandon higher education altogether, negatively affecting institutions, students, and society. Since 1995, in Brazil, the creation of ANDIFES has led to more frequent studies on graduation, retention, and attrition in Brazilian universities, focusing on institutional attrition, characterized by a student's departure from their original course. The University of Brasília (UnB) has implemented strategies to increase student retention in undergraduate courses. This work aimed to develop and test an analysis model to predict attrition in face-to-face courses, through a Systematic Literature Review to identify impact factors and define indicators extracted from UnB's academic systems. The model, named MAGRA, combines indicators with machine learning algorithms to identify students at risk of dropping out. Tests conducted at the Faculdade do Gama (FGA) and UnB indicated that enrollment frequency in subjects could predict completion difficulties. The research suggests that to improve the early identification of at-risk students, adjustments in feedback mechanisms, the inclusion of new systems, improvement in data quality, and adjustments in algorithm parameters are necessary.
  5. Arthur Rocha Temporim de Lacerda, Gamified Chatbot Management Process: A way to build gamified chatbots . Master's Thesis in Applied Computing, University of Brasília (Brazil), 2024 Advisor(s): Sergio Freitas . Tags: Gamification, Machine Learning .
    Chatbot development frameworks provide a range of construction methods; however, established processes like the Chatbot Management Process (CMP) often lack activities specifically tailored to enhance user engagement. This thesis presents the Gamified Chatbot Management Process (GCMP), an extension of the CMP that integrates and adapts activities to boost user interaction with chatbots. Three iterations of the GCMP were created, each incorporating enhancements driven by the Goal-Question-Metric (GQM) methodology. This iterative approach supported the continuous evaluation and evolution of the process. Experiments with real users indicated positive engagement, with all participants successfully meeting the set objectives. Furthermore, the average deployment time was reduced by 66% from the initial to the final version. User assessments also gave top ratings for the quality of responses generated by the chatbot. These results demonstrate the effectiveness of the proposed GCMP. The findings suggest a strong positive correlation between implementing the GCMP and improvements in developing gamified chatbots. The advances in both chatbot functionality and gamification techniques highlight the potential for widespread adoption of the GCMP as a reliable and effective approach for creating gamified chatbots.
  6. Ícaro Oliveira Augusto Silva, Usability Information Extraction from App Comments on the Play Store . Senior Project (Bachelor of Software Engineering) - University of Brasília (Brazil), 2023 Advisor(s): Cristiane Ramos . Tags: Natural Language Processing, Software Quality .
  7. Luiz Henrique Fernandes Zamprogno, Gamified Chatbot for Academic Information Inquiry . Senior Project (Bachelor of Software Engineering) - University of Brasília (Brazil), 2022 Advisor(s): Sergio Freitas . Tags: Gamification, Artificial Intelligence .
  8. Victor Eduardo Araujo Ribeiro, Gamified Chatbot for Academic Information Inquiry . Senior Project (Bachelor of Software Engineering) - University of Brasília (Brazil), 2022 Advisor(s): Sergio Freitas . Tags: Gamification, Artificial Intelligence .
  9. FREITAS, S. A. A.,CANEDO, E. D.,FRINHANI, C. L.,FERNANDES, M. V.,SILVA, M. C., Evaluation of an Automatic Essay Correction System Used as an Assessment Tool , in 19th International Conference on Human-Computer Interaction , 2017 . DOI: 10.1007/978-3-319-58700-4_18 . Tags: Machine Learning, Natural Language Processing .
  10. Yeltsin Suares Gama, Learning Support System. . Senior Project (Bachelor of Software Engineering) - University of Brasília (Brazil), 2017 Advisor(s): Sergio Freitas . Tags: Learning Analytics, Artificial Intelligence .
  11. FREITAS, S. A. A.,SILVA, RITA DE CASSIA,LUCENA, TIAGO FRANKLIN R.,RIBEIRO, EDUARDO DO N.,LIMA, VICTOR COTRIM DE,SILVA, RODRIGO M. S. DA, Smart Quizzes in the Engineering Education , in 2016 49th Hawaii International Conference on System Sciences (HICSS) , 2016 . DOI: 10.1109/HICSS.2016.17 . Tags: Artificial Intelligence, Education .
  12. FRINHANI, CRISTOVAO LIMA,FREITAS, S. A. A.,FERNANDES, MAURICIO VIDOTTI,DIAS CANEDO, EDNA, An automatic essay correction for an active learning environment , in 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA) , 2016 . DOI: 10.1109/AICCSA.2016.7945769 . Tags: Artificial Intelligence, Machine Learning, Natural Language Processing .
  13. Cristóvão de Lima Frinhani, Application of Natural Language Processing - A Support Tool for Correcting Essay Questions . Senior Project (Bachelor of Software Engineering) - University of Brasília (Brazil), 2016 Advisor(s): Sergio Freitas . Tags: Natural Language Processing, Artificial Intelligence, Machine Learning .
    In the context of distance education (DE), which has been experiencing global expansion, one of the most significant challenges is the scalability of evaluating essay questions. This challenge impacts the efficiency of tutors and the quality of feedback provided to students, as grading such questions requires considerable time. To address this issue, a support system for pre-evaluation was developed, utilizing natural language processing (NLP) and machine learning (ML) to automate the assignment of preliminary grades to student responses. The proposed system is divided into two main parts - the first applies semantic similarity algorithms to compare student responses with provided answer keys; the second part is a web interface that facilitates the registration of questions and answer keys, submission of responses by students, and allows the tutor to evaluate the responses. The accuracy of the system is continuously improved through tutor feedback, enabling ML to adjust the evaluation criteria as needed. The system was tested in the course on the fundamentals of computer architecture, where it demonstrated the ability to assign meaningful grades to student responses and revealed an improvement in the accuracy of evaluations based on tutor feedback. It is concluded that the pre-evaluation support system offers a promising solution to the challenges faced in evaluating essay questions in DE environments, enhancing the scalability of the grading process and the quality of feedback provided, with potential for application across various fields of knowledge.
  14. MACIEL, C.,SOUZA, P. C.,VITERBO, J.,MENDES, F. F.,SEGHROUCHNI, A., A Multi-agent Architecture to Support Ubiquitous Applications in Smart Environments , COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE (PRINT) , 498(106), 2015 . DOI: 10.1007/978-3-662-46241-6_9 . Tags: Artificial Intelligence .
  15. Igor Ramos, Proposal of a Methodology for Map Creation in Systematic Mapping Studies Based on Knowledge Related to Ontologies . Senior Project (Bachelor of Software Engineering) - University of Brasília (Brazil), 2015 Advisor(s): Fabiana Mendes . Tags: Digital Transformation, Artificial Intelligence .
  16. MACIEL, CRISTIANO,de Souza, Patricia Cristiane,Viterbo, José,Mendes, Fabiana Freitas,El Fallah Seghrouchni, Amal, A Multi-agent Architecture to Support Ubiquitous Applications in Smart Environments , 2015 . DOI: 10.1007/978-3-662-46241-6_9 . Tags: Artificial Intelligence .
  17. Greg Ouyama Martins, Evaluation of algorithms for sorting digital documents retrieved in search . Senior Project (Bachelor of Software Engineering) - University of Brasília (Brazil), 2013 Advisor(s): Sergio Freitas . Tags: Artificial Intelligence, Natural Language Processing .
    The quest for information originated from libraries, where it was possible to retrieve necessary data through consultations using catalog cards, organizing books by title, author, year, or publisher. With technological advancements, this process was automated, allowing these tasks to be performed on computers. However, due to the vast volume of available information, efficiently finding specific information can be challenging, making the search process exhaustive and complex. To address this issue, there are studies and implementations concerning the sorting of retrieved information. It is also relevant to apply techniques for customized queries based on criteria defined by users. This work aims to analyze the effectiveness of dynamic and static algorithms in sorting retrieved information, exploring the customization of queries in open-source search engines. We will evaluate which algorithm offers greater accuracy, using precision x recall metrics, which allow for a degree of customization in queries, including user profiles from the four engineering courses at the University of Brasília - Faculdade do Gama, and how software engineering can enhance the sorting and retrieval of information.
  18. Débora Zupeli Bossois, Text categorization methodology from unlabeled documents using an anaphora resolution process . Master's Thesis in Informatics, Federal University of Espirito Santo (Brazil), 2010 Advisor(s): Sergio Freitas . Tags: Natural Language Processing, Artificial Intelligence, Machine Learning .
    With the ongoing expansion of electronic textual content, there's a need to organize all this information in a manageable way. Hence, the process of text categorization was developed to ease the manipulation and retrieval of information by dividing it into thematic categories. There are various approaches to achieving an automatic text categorizer, among which the supervised paradigm is the most traditional. Although supervised methodology shows precision comparable to human experts, the requirement for a pre-classified corpus can be a limiting factor in some applications. In these cases, a semi or unsupervised solution, which doesn't demand a complete and well-formed training set for categorizer construction, can be applied; instead, unlabeled documents are provided to the method. Both supervised learning paradigms and semi or unsupervised paradigms typically build a representation of texts based solely on term occurrence, not considering semantic factors. However, many intrinsic characteristics of natural language can make the process ambiguous, one of which is the use of various terms to refer to an entity already mentioned in the text, a linguistic phenomenon known as anaphora. This dissertation proposes a method for the conception of an unsupervised categorizer, utilizing the Nominal Structure of Discourse (NSD), developed by Freitas for anaphora resolution, as a foundation. For this purpose, a bootstrapping technique for categorization is implemented, aiming to obtain initial labeling for documents, which is used to generate a categorization model through the supervised paradigm. Besides being based on the NSD, the methodology of this work benefits directly from the anaphora resolution process, using the identified antecedents for anaphoras during the final categorization phase. This work presents details about the proposed methodology, explaining the developed algorithms, as well as the experiments conducted for method evaluation. Results show that the use of the anaphora resolution process is beneficial for an unsupervised categorization system.
  19. Luana Vieira Morellato, Computational methodology for identification of nominal phrases in Portuguese . Master's Thesis in Informatics, Federal University of Espirito Santo (Brazil), 2010 Advisor(s): Sergio Freitas . Tags: Natural Language Processing, Artificial Intelligence .
    Phrases, or syntagms, are units of meaning with syntactic functions within a sentence, as described by Nicola (2008). Generally, the sentences in any statement express content through the elements and combinations thereof provided by the language. This process forms sets and subsets that act as syntactic units within the larger unit of the sentence—syntagms, which can be categorized into nominal and verbal. Among these, nominal syntagms are of particular interest due to their higher semantic value. They are employed in Natural Language Processing (NLP) tasks such as anaphora resolution, automatic ontology building, parsing in medical texts for summary generation and vocabulary creation, or as an initial step in syntactic analysis processes. In Information Retrieval (IR), syntagms can be used to create terms in document indexing and search systems, improving results. This dissertation presents a computational methodology for identifying nominal syntagms in Portuguese language digital documents. It outlines the methodology for identifying and extracting nominal syntagms through the development of SISNOP—System for Identifying Nominal Syntagms in Portuguese. SISNOP, comprising a set of modules and programs, interprets unrestricted texts in natural language through morphological and syntactic analyses to retrieve nominal syntagms, also providing syntactic information such as gender, number, and degree of words within the extracted syntagms. Tested on corpora like CETENFolha and CETEMPúblico, SISNOP recognized 98.12% and 94.59% of sentences, identifying over 24 million syntagms. Its modules—Morphological Tagger, Nominal Syntagms Identifier, and Gender, Number, and Degree Identifier—were individually tested using a smaller dataset due to manual result analysis, achieving a precision of 82.45% and coverage of 69.20%.
  20. Pereira, F. S. C.,Seibel, Hilário,FREITAS, S. A. A., An Anaphora Based Information Retrieval model Extension , in 2009 World Congress on Computer Science and Information Engineering, 2009, Los Angeles , pp. 330-334, 2009 . DOI: 10.1109/CSIE.2009.833 . Tags: Natural Language Processing, Artificial Intelligence .
  21. Pereira, F. S. C.,MORELLATO, L.,FREITAS, S. A. A., Evaluation of an information retrieval model based in anaphora resolution , in IADIS International Conference on WWW/Internet, 2009, Roma , pp. 334-339, 2009 . Tags: Natural Language Processing, Artificial Intelligence .
  22. Francisco Santiago do Carmo Pereira, A methodology for the use of natural language processing in the search for information in digital documents . Master's Thesis in Informatics, Federal University of Espirito Santo (Brazil), 2009 Advisor(s): Sergio Freitas . Tags: Natural Language Processing, Artificial Intelligence, Machine Learning .
    This dissertation presents a novel digital text search methodology grounded in the Nominal Structure of Discourse, expanding on anaphora resolution techniques proposed by Freitas [2005]. It employs these techniques to reveal the textual structure designed by authors, offering a unique approach to Information Retrieval (IR). Traditional IR models, like the vector space model and Latent Semantic Indexing, primarily use document terms for representation and search, often failing to account for the nuances of natural language, such as the use of anaphoras. These linguistic features can diminish the representational efficacy of classical models by complicating the identification of key entities within texts. To address these limitations, an alternative structural model that incorporates anaphora resolution into its computational representation of documents is introduced, based on Seibel Júnior's work [Seibel Júnior and Freitas, 2007]. This model, which focuses on the Discourse Nominal Structure for Searches (DNSS), aims to improve upon existing IR methodologies by ensuring a more nuanced and effective representation of documents. It does so by identifying and utilizing the central elements, or focuses, of text sentences, while also considering additional information provided by the Nominal Structure of Discourse (NSD). The dissertation elaborates on the development of this search structure, detailing the anaphora resolution process and its impact on enhancing the representation and search result quality of documents. Through the exploration of algorithms and experimentation, the study underscores the potential of this new methodology in advancing the field of Information Retrieval.
  23. Seibel, Hilário,FREITAS, S. A. A., Methodology for retrieval of relevant information in digital documents based on anaphora resolution , in XXXIII Latin American Conference on Informatics CLEI 2007, 2007, San José - Costa Rica , 2007 . Tags: Natural Language Processing, Artificial Intelligence .
  24. Hilário Seibel Júnior, Retrieval of relevant information in digital documents based on anaphora resolution . Master's Thesis in Informatics, Federal University of Espirito Santo (Brazil), 2007 Advisor(s): Sergio Freitas . Tags: Natural Language Processing, Artificial Intelligence .
    Traditional Information Retrieval (IR) methods are primarily based on counting the frequency of word occurrences in a document, without offering solutions for interpreting the semantic content of the discourse [Van Rijsbergen 1979; Baeza-Yates and Ribeiro-Neto 1999]. By not interpreting the analyzed document, such methods may overlook important information about it. A solution to overcome this issue, mentioned in [Salton and McGill 1986], is to use Natural Language Processing (NLP) in information retrieval. One application of NLP is the processing of anaphoras. Anaphora [Carter 1987; Beaver 2004] is a linguistic phenomenon where an entity introduced a priori is referenced later in another sentence through some linguistic expression, as in "Valentina was born in São Paulo. The girl is a Pisces." Anaphora resolution identifies that the term "girl" in the second sentence references the entity introduced in the discourse by the term "Valentina" from the first sentence. This allows us to assert that Valentina is more relevant to the text than if such reference had not occurred. Freitas proposes in [Freitas 2005] a method to resolve anaphoras in a document by creating a structure that allows tracking entities that remain prominent throughout the discourse. This structure stores information that can be leveraged by an information retrieval method. This dissertation proposes a computational methodology to retrieve relevant information from the resolution of anaphoras in a document, aiming to improve the quality of query results. Anaphora resolution enables precise identification of the number of times each entity is referenced in a discourse, revealing entities and connections that may have been obscured in the original discourse. This information makes it possible to decide whether a certain entity is more relevant than another in the document, focusing more on what the author wrote. Thus, the relevant documents retrieved are ranked by the amount of information they present regarding the searched terms, and not merely by the location and/or number of occurrences of such terms. This work also allows identifying, through the structure generated by anaphora processing, synonymous terms (those that reference the same entity). If the document indicates that two terms are synonymous, searching for one will return the same result as searching for the other, further increasing the quality of query results. This work presents the details of the proposed methodology - the measures used to calculate the relevance of a term in relation to the document interpreted through anaphora processing, the procedures necessary for conducting a query, the implemented prototype, and the analysis of its time complexity. Furthermore, the characteristics of this approach that differentiate it from traditional methods regarding the quality of the obtained results are evaluated.
  25. Barbosa, H. A.,FREITAS, S. A. A., Intrusion Detection Systems in Industrial Automation Networks , in 6th International Congress on Automation, Systems and Instrumentation, 2006, São Paulo , 2006 . Tags: Artificial Intelligence .
  26. Sergio Antônio Andrade de Freitas, Automated text interpretation - anaphora processing . PhD Thesis, Federal University of Espirito Santo (Brazil), 2005 . Tags: Natural Language Processing, Artificial Intelligence .
    This thesis presents a solution to the interpretation of de nite descriptions in Portuguese. For example, consider the following text- (1) a. Mariana bought a new car. b. The engine was damaged. The sentence (1a) introduces two entities- Mariana and a car which is new. The sentence (1b) introduces only one entity the engine. In a human or computer interpretation process, the use of the de nite article the preceeding a noun indicates that the introduced entity was already present at the discourse, i.e., it is an anaphoric entity. The resolution of an anaphora is a reference problem, but in the example (1) there is another problem- although the car is the entity that gives context to the engine, we can not say that the engine is the car (as for a pronominal anaphora). It also must be determined how the engine is related to the car. This is a de nite description problem. The interpretation of any kind of anaphora can be represented by the following equation- R(A, T ) (2) where A denotes an entity introduced by the context interpretation of a pronoun, an ellipsis or a de nite noun phrase, T denotes its antecedent and R is the relation between A and T . The equation's resolution process is summarized as- given A nd T and R. This thesis proposes a methodology to the de nite description interpretation that the relation R is of- part of, member of, subcategorized by and corefers. These relations are obtained by a set of pragmatic rules [Freitas, Lopes e Menezes 2004, Filho e Freitas 2003], which are here de ned (chapter 3). Also if A is not anaphoric then it is acommodated in the discourse context. The computational methodology is implemented in a logic programming system [Damásio, Nejdl e Pereira 1994] that permits an abductive reasoning [Kakas, Kowalski e Toni 1992] at the semantic representation of the discourse [Kamp e Reyle 1993]. The interpretation of the entities is the basis to the Discourse Nominal Structure [Lopes e Freitas 1994] (chapter 4), which allows- (1) to track the most salient entities at each sentence [Freitas e Lopes 1994], (2) to limit the number of possible antecedents [Freitas e Lopes 1996] and (3) to give a discourse entities summary. The result is an integrated metodology to solve anaphors and ellipses. Finally, the Nominal Structure of the Discourse can help the search/index of digital documents.
  27. Ayrton Monteiro Cristo Filho, Computational interpretation of the simple future perfect tense in narratives for Brazilian Portuguese . Master's Thesis in Informatics, Federal University of Espirito Santo (Brazil), 2004 Advisor(s): Berilhes Borges Garcia, Sergio Freitas . Tags: Natural Language Processing, Artificial Intelligence .
  28. Cristo Filho, A. M.,FREITAS, S. A. A., Interpretation of the Future of the Past in Narratives , in 1º Workshop em Tecnologia da Informação e da Linguagem Humana, 2003, São Carlos - SP , 2003 . Tags: Artificial Intelligence, Natural Language Processing .
  29. Sergio Antônio Andrade de Freitas, Deixis and pronominal anaphora in dialogs . Master's Thesis in Informatics, Federal University of Rio Grande do Sul (Brazil), 1993 . Tags: Natural Language Processing, Artificial Intelligence .
    The proposal of this work is to implement a set of dialog elements expressed by two human agents. The pronominal anaphora and some deixis pronoums (in portuguese I, you, your, my) that eventually appear during the dialog are resolved. Basically, this work is divided in four parts - 1. An introdutory study of the Discourse Representation Theory (DRT) [KAM88, KAM90]. The DRT is a formalism for discourse representation that uses models for semantic evaluation of the representation structures.
  30. Luis Bruno Fidelis Gomes, Business Intelligence Applied to Health: Monitoring Dashboards for Health Professionals in the Psychosocial Care Network . Bachelor of Software Engineering - University of Brasília (Brazil), 2023 Advisor(s): George Marsicano . Tags: Artificial Intelligence .

Apps and programs

Contact & Collaboration

For information and contact with the team: sergiofreitas@unb.br.

← Digital Transformation Active Learning Methodologies →