Back to Table of contents

Primeur weekly 2020-06-29

Quantum computing

Atos takes the most powerful quantum simulator in the world to the next level with Atos QLM E ...

New model helps to describe defects and errors in quantum computers ...

Research and Markets to issue 2020 Revenue Assessment for Post-Quantum Cryptography (PQC) report ...

Focus on Europe

Iceland joined the LUMI consortium ...

LRZ to deploy HPE's Cray CS500 system to bring innovative architecture to the science of its users ...

Outcome of the e-IRG Webinar Series organized in the framework of the Croatian EU Presidency on 25-26 May 2020: Grand challenges of e-infrastructures within the new ERA ...

Spring 2020 edition of the e-IRG Magazine available ...

MareNostrum will generate a language model in Spanish based on millions of digital contents from the National Library of Spain ...

A team of bioengineering and computing experts outlines how the creation of text mining and Artificial Intelligence tools could advance biomaterials research and development ...

Barcelona Supercomputing Center releases OmpSs-2 version 2020.06 ...

Barcelona Supercomputing Center releases COMPSs version 2.7 ...

Ghent University engages Atos to build third Tier1 VSC supercomputer supporting research in Flanders ...

Dell Technologies high performance computing customers drive breakthroughs for global impact ...

Hardware

NCSA and Southern Methodist University announce new strategic partnership ...

TYAN brings the latest server advancements at its 2020 Server Solutions Online Exhibition ...

WekaIO awarded a patent for Flash Registry with Write Leveling ...

Western Digital's New NVMe SSDs and NVMe-oF solutions provide the foundation for next-generation, agile data infrastructures ...

CW-WDM MSA Group forms to drive new industry standard for optical laser sources ...

Battelle awarded contract to enable Arctic research ...

Applications

Project to calculate emissions reductions across Europe during the COVID-19 pandemic ...

Supercomputer simulations show how DNA prepares itself for repair ...

Georgia Tech engineers simulate solar cell work using supercomputers ...

A new way of designing global satellite missions ...

Process for 'two-faced' nanomaterials may aid energy, information tech ...

New transatlantic lab decodes the brain with AI ...

Ion conducting polymer crucial to improving neuromorphic devices ...

Scientists develop new tool to design better fusion devices ...

C3.ai Digital Transformation Institute announces COVID-19 awards ...

UCF Consortium announces OpenSNAPI project to develop an open, standard application programming interface for smart networking adapters ...

The Cloud

SDSC's Sherlock Cloud announces Skylab ...

Dell Technologies brings IT infrastructure and Cloud capabilities to Edge environments ...

New technique may enable all-optical data-centre networks ...

Schrödinger expands discovery efforts for COVID-19 Alliance with advanced molecular simulation leveraging high-powered parallel computing on Google Cloud ...

Nebulon emerges from stealth and announces Cloud-Defined Storage ...

Repertoire Immune Medicines receives funding from COVID-19 High Performance Computing Consortium ...

W3BCLOUD raises $20,5 million to roll out a network of data centres dedicated for the blockchain economy ...

MareNostrum will generate a language model in Spanish based on millions of digital contents from the National Library of Spain


This project is part of a commission to the BSC from the Secretary of State for Digital and Artificial Intelligence Advancement, in the framework of the plan to promote language technologies.
22 Jun 2020 Barcelona - The supercomputer MareNostrum has already started to receive a vast amount of data from the Web archive of the National Library of Spain, which will be the base to generate a model of the Spanish language and other languages from the state. The archive of the Spanish web is the collection formed by websites with the domain .es - including blogs, forums, documents, images, videos, etc. - plus all those considered documentary heritage included in other domains that are collected so as to preserve the Spanish documentary heritage on the Internet and to ensure access to it. The Barcelona Supercomputing Center (BSC) will be responsible for its undertaking, as commissioned by the Secretary of State for Digital and Artificial Intelligence Advancement (SEDIA), in the framework of the Plan to promote language technologies.

This task is twofold: the transportation of the data to the supercomputer, and its processing to generate a language model. For some weeks now the MareNostrum has initiated content storage, after developing an extraction process of textual data from the Web archive of the library, which has allowed to transfer content to the BSC promptly. The transmission of this enormous quantity of data was one of the significant challenges of this initiative. As of now, the supercomputer has already stored 45 Terabytes.

The next step will be the processing of this data to generate language models through natural language processing technologies. This resource is already available in English, the best known is Google Bert, which has been a milestone in the processing of natural language. The model in which the BSC is working stands out from other initiatives of Spanish language models because of the quantity of Spanish linguistic data it contains, which makes it more precise and practical for cross use.

Language models reproduce language use and allow us to know the real meaning of words, even in whole sentences, since the data is contextualized and has more information and sense. This allows to disambiguate the sense of words - for instance, to distinguish between the meaning of sick in 'This is sick!' or in 'I'm feeling sick'. It also allows us to interpret the ideological bias, and it opens the way to deal with irony and figurative sense. It also endows artificial intelligence systems with common sense.

Quim Moré, researcher from the CASE department of the BSC, and David Vicente, team manager of the Operations group, are the ones responsible for this project. According to Quim Moré: "the generation of language models is vital to artificial intelligence. The computer application of a disambiguous language model with a context founded in our world knowledge means a great advance in the generation of smarter and closer systems".

The applications of this model are diverse: from an automatic translation, cybersecurity, or the description of the content of a XV-century picture made by a robot. Nevertheless, models capable of generating this revolution require such computational and data resources that only a few centres and companies, such as Google or Facebook, do have.

In this sense, Quim Moré highlighted that "we are lucky that MareNostrum has the computing capacity that we need, and on the other hand, we have the huge linguistic data amount revised and provided by the National Library. We have a great opportunity to be on the same level as the great centres of artificial intelligence and also to provide a computational application of linguistic knowledge to culture".

The Spanish web archive is the collection formed by websites with the .es domain and others - including globs, forums, documents, images, videos, etc. - that are collected in order to preserve the Spanish documentary heritage on Internet and to ensure access to it. In December 2019, there was the 10th anniversary since the launch of the Spanish web archive project. Since then, the Spanish National Library has strengthened its infrastructure, politics and processes to carry out this task to preserve online heritage, just as the most important national libraries have been doing for years now.
Source: Barcelona Supercomputing Center - BSC

Back to Table of contents

Primeur weekly 2020-06-29

Quantum computing

Atos takes the most powerful quantum simulator in the world to the next level with Atos QLM E ...

New model helps to describe defects and errors in quantum computers ...

Research and Markets to issue 2020 Revenue Assessment for Post-Quantum Cryptography (PQC) report ...

Focus on Europe

Iceland joined the LUMI consortium ...

LRZ to deploy HPE's Cray CS500 system to bring innovative architecture to the science of its users ...

Outcome of the e-IRG Webinar Series organized in the framework of the Croatian EU Presidency on 25-26 May 2020: Grand challenges of e-infrastructures within the new ERA ...

Spring 2020 edition of the e-IRG Magazine available ...

MareNostrum will generate a language model in Spanish based on millions of digital contents from the National Library of Spain ...

A team of bioengineering and computing experts outlines how the creation of text mining and Artificial Intelligence tools could advance biomaterials research and development ...

Barcelona Supercomputing Center releases OmpSs-2 version 2020.06 ...

Barcelona Supercomputing Center releases COMPSs version 2.7 ...

Ghent University engages Atos to build third Tier1 VSC supercomputer supporting research in Flanders ...

Dell Technologies high performance computing customers drive breakthroughs for global impact ...

Hardware

NCSA and Southern Methodist University announce new strategic partnership ...

TYAN brings the latest server advancements at its 2020 Server Solutions Online Exhibition ...

WekaIO awarded a patent for Flash Registry with Write Leveling ...

Western Digital's New NVMe SSDs and NVMe-oF solutions provide the foundation for next-generation, agile data infrastructures ...

CW-WDM MSA Group forms to drive new industry standard for optical laser sources ...

Battelle awarded contract to enable Arctic research ...

Applications

Project to calculate emissions reductions across Europe during the COVID-19 pandemic ...

Supercomputer simulations show how DNA prepares itself for repair ...

Georgia Tech engineers simulate solar cell work using supercomputers ...

A new way of designing global satellite missions ...

Process for 'two-faced' nanomaterials may aid energy, information tech ...

New transatlantic lab decodes the brain with AI ...

Ion conducting polymer crucial to improving neuromorphic devices ...

Scientists develop new tool to design better fusion devices ...

C3.ai Digital Transformation Institute announces COVID-19 awards ...

UCF Consortium announces OpenSNAPI project to develop an open, standard application programming interface for smart networking adapters ...

The Cloud

SDSC's Sherlock Cloud announces Skylab ...

Dell Technologies brings IT infrastructure and Cloud capabilities to Edge environments ...

New technique may enable all-optical data-centre networks ...

Schrödinger expands discovery efforts for COVID-19 Alliance with advanced molecular simulation leveraging high-powered parallel computing on Google Cloud ...

Nebulon emerges from stealth and announces Cloud-Defined Storage ...

Repertoire Immune Medicines receives funding from COVID-19 High Performance Computing Consortium ...

W3BCLOUD raises $20,5 million to roll out a network of data centres dedicated for the blockchain economy ...