Output Database Schema
Database Schema¶
The framework creates a DuckDB database with the following schema structure:
Schemas¶
- entity: Contains standardized entity information
name
: Unique entity names with IDsaddress
: Unique addresses with IDsstreet
: Unique street information with IDsstreet_name
: Unique street names with IDsname_similarity
: TF-IDF similarity scores between entity namesstreet_name_similarity
: TF-IDF similarity scores between entity addresses
- link: Contains match information between entities
{entity1}_{entity2}
: Links between entities with match scores
- User-defined schemas: Contains the original data with cleaned fields
- Tables as defined in your configuration
Key Tables¶
entity.name¶
entity
: Standardized entity namename_id
: Unique identifier for the entity name
entity.address¶
entity
: Standardized addressaddress_id
: Unique identifier for the address
entity.street¶
entity
: Standardized streetstreet_id
: Unique identifier for the street
entity.name_similarity¶
entity_a
: First entity nameentity_b
: Second entity namesimilarity
: TF-IDF similarity score (0-1)id_a
: ID of first entityid_b
: ID of second entity
entity.street_name_similarity¶
entity_a
: First entity addressentity_b
: Second entity addresssimilarity
: TF-IDF similarity score (0-1)id_a
: ID of first entityid_b
: ID of second entity
link.{entity1}_{entity2}¶
{entity1}_{id1}
: ID from first entity{entity2}_{id2}
: ID from second entity- Various match columns with binary (0/1) or similarity scores (0-1)