Output Database Schema
Database Schema¶
The framework creates a DuckDB database with the following schema structure:
Schemas¶
- entity: Contains standardized entity information
name: Unique entity names with IDsaddress: Unique addresses with IDsstreet: Unique street information with IDsstreet_name: Unique street names with IDsname_similarity: TF-IDF similarity scores between entity namesstreet_name_similarity: TF-IDF similarity scores between entity addresses
- link: Contains match information between entities
{entity1}_{entity2}: Links between entities with match scores
- User-defined schemas: Contains the original data with cleaned fields
- Tables as defined in your configuration
Key Tables¶
entity.name¶
entity: Standardized entity namename_id: Unique identifier for the entity name
entity.address¶
entity: Standardized addressaddress_id: Unique identifier for the address
entity.street¶
entity: Standardized streetstreet_id: Unique identifier for the street
entity.name_similarity¶
entity_a: First entity nameentity_b: Second entity namesimilarity: TF-IDF similarity score (0-1)id_a: ID of first entityid_b: ID of second entity
entity.street_name_similarity¶
entity_a: First entity addressentity_b: Second entity addresssimilarity: TF-IDF similarity score (0-1)id_a: ID of first entityid_b: ID of second entity
link.{entity1}_{entity2}¶
{entity1}_{id1}: ID from first entity{entity2}_{id2}: ID from second entity- Various match columns with binary (0/1) or similarity scores (0-1)