GDS Technical Architecture
GDS Technical Architecture Overview
This document provides a technical overview of the Geometrodynamic Semantics (GDS) research prototype architecture as implemented.
System Design Philosophy
GDS models semantic reasoning as a physics-inspired process operating over a hyperdimensional concept space. Rather than treating language as statistical patterns, the system represents meaning through:
- Semantic particles: Concepts encoded as 20,000-dimensional binary HDC vectors
- Geometric relationships: Knowledge graph edges with physics-inspired properties (mass, valence, affective dimensions)
- Geodesic reasoning: Path-finding through semantic space using composite cost functions
Core Architecture Layers
1. Semantic Base Layer (Static)
The foundational lexicon built from multiple knowledge sources:
- Storage Format: Parquet with ZSTD compression (~7.5GB for 340k concepts)
- Conceptual State Injector (CSI-HDC): Replaces traditional tokenization with 20,000-dimensional binary hypervectors
- Vector Expansion: Julia HDC server generates concept hypervectors from 300D embeddings
- Graph Structure: Curated assertions (ConceptNet) + proximity edges (k-NN via FAISS binary indexing)
Key Properties Per Concept:
particle_id,concept_id,lemma,languagem0(semantic mass/centrality)- VAD (Valence-Arousal-Dominance) affective dimensions
- 20,000-bit binary HDC vector
- Graph connectivity (edges with relation types)
2. Dynamic Context Overlay (Runtime)
LMDB-backed delta store for runtime learning:
- Edge weight adjustments during reasoning sessions
- Degree-normalized updates to maintain graph stability
- Checkpointing for session recovery
- Telemetry monitoring (drift detection, margin analysis)
3. Reasoning Engine
Path-based semantic reasoning through the concept graph:
- Geodesic solver: Multi-component cost function combining:
α·inv_m0: Semantic mass preference (favor central concepts)β·vad: Affective dimension matchingγ·rel: Relation type appropriatenessλ·overlay: Runtime learned preferences
- Explainability: JSON traces (
gds.explain.path@v1) documenting:- Traversed nodes and edges
- Cost breakdown per component
- Reasoning margin (confidence)
4. Learning Subsystem
Physics-inspired autonomous learning:
- Local Hebbian updates: Edge strengthening based on usage patterns
- Validation gates: Telemetry-based checks before consolidating learned changes
- Diffusion smoothing: L2 decay prevents overfitting to specific paths
- Margin-gain optimization: Focus learning on high-uncertainty decisions
Technology Stack
| Component | Technology | Purpose |
|---|---|---|
| Core Logic | Rust | Reasoning engine, graph operations, I/O |
| Vector Expansion | Julia | HDC hypervector generation from embeddings |
| Storage | Parquet + ZSTD | Compressed columnar lexicon storage |
| Overlay | LMDB | Runtime delta store with checkpointing |
| Indexing | FAISS (binary) | k-NN similarity search for graph construction |
| FFI | JSON over stdin/stdout | Rust ↔︎ Julia communication |
Data Sources
The research prototype integrates multiple knowledge bases:
- ConceptNet 5: ~2.1M curated semantic assertions (140k concepts after filtering)
- ConceptNet Numberbatch: 300D multilingual embeddings
- NRC-VAD Lexicon: Affective dimensions for emotional reasoning
- OEWN (Open English WordNet): Lexical taxonomy and relationships
Implementation Highlights
Storage Optimization Journey
The project achieved ~800x storage reduction through progressive optimization:
- Initial projection: 6TB for full lexicon (300D floats + metadata)
- Binary HDC encoding: Reduced to ~68GB (20k-bit vectors)
- Parquet columnar storage: Final ~7.5GB (with ZSTD compression)
CSI-HDC Tokenization
Conceptual State Injector using Hyperdimensional Computing (CSI-HDC) replaces traditional neural tokenizers:
- Input: Concept identifier (e.g.,
"en/cat","ro/calculator") - Process: Julia HDC server expands 300D embedding → 20,000D binary vector
- Output: Semantic particle ready for graph-based reasoning
- Advantage: Interpretable, multilingual, compositional
Cross-Language Capabilities
The system demonstrates multilingual semantic reasoning:
- English: Primary language (140k concepts)
- Romanian: Secondary language (~200k concepts)
- Shared HDC space: Cross-lingual concept alignment via Numberbatch embeddings
- Relation preservation: Language-agnostic graph structure
Current Research Status
GDS is a proof-of-concept prototype demonstrating:
✅ Operational CSI-HDC tokenization (20k-dimensional binary vectors)
✅ Functional graph-based reasoning with explainable paths
✅ Autonomous learning gates with validation
✅ Efficient storage (Parquet + LMDB architecture)
✅ Multilingual capabilities (EN/RO)
This architecture represents an exploration of physics-inspired semantic reasoning, not a production system. All design decisions prioritize research interpretability and experimental flexibility.
Further Reading
- Model Card: Detailed specifications and capabilities
- Development Chronicle: Engineering journey and optimizations
- Citations: Academic references and data sources
This architecture serves the Independent Research & Development Genesis initiative exploring novel AI paradigms. For questions or collaboration: mihai.mateescu@web.de