April 25 ~ 26, 2026, Copenhagen, Denmark
Inés Pérez 1, Lara Suárez 1, Gabriel Novas 1, and Santiago Muíños 1,
Automatic retrieval and standardization of material data from scientific literature remains a critical challenge. This work introduces an artificial intelligence-based framework that extracts and structures material property data and converts it into the International System of Units using state-of-the-art opensource models (Mistral, LLaMA3). The system is highly modular, enabling independent development of specific information retrieval tasks, including PDF parsing, semantic table interpretation, text segmentation, reranking and embedding models for retrieval, and post-processing for data structuring, format validation, and unit conversion. Both supervised and unsupervised evaluation methods are employed to quantify accuracy, consistency, and confidence in the extracted data. By leveraging multiple complementary PDF parsing configurations, we implemented a robust unsupervised evaluation strategy that enhances output reliability. Large Language Models (LLMs) were systematically evaluated using both zero-shot and in-context learning, complemented by a custom scoring system for supervised assessment. The framework was applied to two representative manufacturing use cases (metal additive manufacturing and fiber-reinforced composites) to address the challenge of incomplete, scattered, and heterogeneous material property data in simulation workflows. Results show high extraction accuracy across diverse document layouts, with complementary parsing strategies improving LLM comprehension and tabular processing further boosting overall performance.
Information Retrieval, PDF Parsing, LLMs, Data Standardization, Material Properties
Bin Ge 1, Chunhui He 1, Qingqing Zhao 1, Chong Zhang 1,Jibing Wu 1 1 Laboratory for Big Data and Decision, National University of Defense Technology, Changsha 410073, China 2Unit 96941, Beijing 100085, China
Identifying high-quality frontier topics from massive scientific research data to assist researchers in accurately conducting scientific research is of paramount importance. Traditional analysis methods face bottlenecks such as limited cross-domain adaptability, high resource consumption, and low efficiency. To address these challenges, this study proposes an AI-agent-based frontier topic mining method. An innovative generative-verification dual-agents (D-Agents) architecture is constructed. Specifically, prompt engineering is employed to develop a generative agent (G-Agent), which leverages the semantic understanding capabilities of large-scale pre-trained language models to automatically generate candidate frontier topics. Subsequently, a verification agent (V-Agent) is introduced to establish a multi-dimensional evaluation system, which automatically verifies candidate topics from dimensions including academic novelty, topic accuracy, and completeness of frontier topics. The effectiveness of the proposed method is validated through three manually labeled datasets in computer vision (CV), natural language processing (NLP), and machine learning (ML). Experimental results demonstrate that the D-Agents framework can simultaneously perform frontier topic mining tasks across multiple domains. On the three labeled datasets (CV-DataSet, NLP-DataSet, and ML-DataSet), the D-Agents achieve a precision exceeding 74% while maintaining a recall over 85%. Compared to traditional bibliometric methods, the proposed approach significantly improves precision and recall in frontier topic mining across three distinct fields: altitude sickness, recommendation systems, and oyster reef ecosystems, with performance exceeding 67%. The DAgents framework effectively mitigates the hallucination issue of G-Agent through its automatic generation and self-verification mechanism, thereby substantially enhancing the efficiency of frontier topic mining.
LLMs, Frontier Topics, Prompt Engineering, D-Agents, G-Agent, V-Agent, RAG
Arnold Valaro 1, Lloyd Matak 2
1Independent Researcher
2Lewis-Clark State College, USA
Access to clean and safe drinking water remains a critical public health challenge in rural Malawi, where only 43% of households have reliable access to potable water. This systematic review synthesizes and critically evaluates existing literature on low-cost water purification technologies suitable for rural Malawian contexts, with a focused lens on affordability, sustainability, community adoption, and contextual integration. Traditional methods such as boiling, sedimentation, and chlorination, while culturally embedded, are significantly limited by economic constraints, fuel dependency, logistical hurdles, and persistent knowledge gaps. Emerging decentralized solutions, including bio-sand filters (BSFs), ceramic filters, and adsorbent-based systems, demonstrate considerable technical promise for pathogen and turbidity reduction. However, their long-term effectiveness faces substantial barriers related to material durability, maintenance complexity, supply chain reliability, and socio-cultural acceptance. Recent innovations leveraging locally sourced adsorbents such as rice husk ash, activated charcoal from agricultural waste, and iron oxide-rich red soil offer environmentally sustainable and economically viable pathways for decentralized water treatment. Crucially, this review argues that technological innovation alone is insufficient. Successful and sustainable implementation necessitates integrated, multi-sectoral approaches that synergistically combine appropriate technology design with comprehensive community training, participatory co-design, gender-sensitive planning, supportive local governance, and resilient infrastructure. The review concludes by identifying pivotal research gaps, including the urgent need for robust multi-contaminant removal systems, longitudinal field evaluations under real-world conditions, culturally adapted implementation frameworks, and scalability studies incorporating climate resilience. The findings robustly underscore the indispensable importance of developing and deploying context-sensitive, low-cost, and community-owned water solutions as foundational to achieving Sustainable Development Goal 6 (clean water and sanitation) and improving public health outcomes in Malawi.
Water purification, low-cost technologies, rural Malawi, adsorbents, bio-sand filters, ceramic filters, community engagement, sustainable WASH, appropriate technology, public health.
Youssef Alothman , and Mohamed Bader-El-Den
1University of Portsmouth, UK
2Abdullah Al Salem University, Kuwait
The semiconductor manufacturing industry generates large volumes of highly imbalanced, non-stationary, and operationally critical textual data. Although transformer-based language models achieve strong classification accuracy, their robustness and probability calibration under industrial constraints remain insufficiently addressed, particularly in resource-limited deployments. This paper proposes LiteFormer, a lightweight and calibrated transformer framework for imbalanced industrial text classification. The approach integrates geometry-aware minority oversampling using D-SMOTE, imbalance-sensitive optimization through Focal Loss, and post-hoc temperature scaling for probability calibration. Experimental evaluation on a large-scale industrial Root Cause Analysis corpus demonstrates improved macro-F1 performance and substantially reduced Expected Calibration Error compared to standard transformer baselines, while maintaining computational efficiency. Results under temporal and domain shift further confirm stable performance and reliable confidence estimation.
Imbalanced text classification, lightweight transformers, probability calibration, focal loss, industrial NLP