Processing structured data for the Knowledge Graph

TG Data Set: A collection for training AI models.
Post Reply
Reddi1
Posts: 369
Joined: Thu Dec 26, 2024 3:11 am

Processing structured data for the Knowledge Graph

Post by Reddi1 »

The number one place for Google to get information about entities is through sources that provide them with structured data.

In this article I will only deal with this type of data source. I will discuss the far more complex methodology of extracting unstructured data and semi-structured data, such as from Wikipedia, in subsequent articles.

Google can record the structured data using the Resource hungary phone number data Description Framework (RDF for short). An entity is a summary of various RDF statements based on the object-predicate-subject pattern. A statement would be, for example, "Canberra is the capital of Australia."

This connection can also be represented grammatically as follows. Canberra is the subject , Australia is the object and (is the)capital is the predicate . The type of relationship can also be described by a verb such as "Thomas Müller plays for Bayern Munich." The object and subject are therefore always entities. The predicate can be an entity type or class, an attribute, a verb or a combination of all of them.

Most structured databases provide the information in machine-readable RDF format or allow translation into this format. Google uses databases that they trust, such as Wikidata, CIA World Factbook..., structured data sets or translation databases such as DBpedia or YAGO, which translate Wikipedia information into machine-readable data.

Since the databases and data sets with structured data grow and are updated relatively slowly, it is not surprising that Google repeatedly encourages webmasters to work with structured data in their websites. The more Google collects and processes structured data, the closer they come to the goal of being able to process unstructured data as well. The structured data functions as training data for machine learning.

You can read more about this in my article Why structured data could become obsolete for Google in the future .
Post Reply