Prologue

Originally, this was part of my previous post on knowledge graphs (KGs), but as I kept writing, I realized that it deserved its own space. This isn’t just about KG construction anymore—it’s about my personal journey through the field, the roadblocks I faced, and ultimately, the perspective I’ve gained. While there is more that I want to say, I think I’d just have to slowly churn it out, my mind is quite messy these weeks.

My Experience with Working on Autonomous KG Construction

From my direct engagement with semantic KG research, I can say with confidence that diving into this field alone and without prior NLP experience feels like stepping into a dead-end maze. The sheer complexity, the lack of concrete solutions, and the overwhelming reliance on trial-and-error methodologies make it one of the most frustrating areas to navigate.

Even with the rise of larger models, the task remains far from solved. Instead of real breakthroughs, the field often feels like an endless cycle of remixing existing components, running ablation studies, and tweaking architectures without solid theoretical backing-all in pursuit of benchmark scores that fail to translate to real-world utility.

My personal experience of working on autonomous KG construction under constraints of limited time, resources, and external guidance has been nothing short of a nightmare. While I have learned a great deal-mostly about what doesn’t work-the process has been an exercise in frustration. The brittleness of current methods, especially in LLM-centric approaches, makes it clear that without fine-tuning, safeguards, and heavy-handed interventions, the entire system collapses under its own weight.

And that’s where the paradox begins-The field embraces probabilistic approaches (LLMs, embeddings) yet constantly relies on heuristics (fine-tuning, manual corrections, safeguards) to keep them from failing. It claims to move away from rule-based systems, yet in practice, it reintroduces rules at every step to compensate for the lack of structure.

With that being said, I don’t think this is the end of the field-far from it. But the sheer number of trial-and-error studies with minimal theoretical backing is exhausting. I just wish researchers were more transparent about the limitations. I can imagine highly specialized teams, equipped with domain expertise and years of experience, making these systems work in specific fields. However, what truly bothers me is how some studies try to sell these systems as an alternative to naive RAG (Retrieval-Augmented Generation) when, in practice, they are far from being commercially viable.

For commercial applications, these systems demand heavy investment, and even with what I’ve seen in the literature, I struggle to see the payoff. Maybe there’s a hidden framework, a well-kept secret sauce that allows knowledge graphs to work at scale in domain-specific settings, but in my opinion, this is not a field you can just “enter” and innovate. Most approaches have already been explored and exhausted-if you want to do something novel, you either:

  • Reblend existing components and run more ablation studies (the game everyone is playing),
  • Over-engineer a system with endless tweaks just to make it work, or
  • Introduce a complete paradigm shift (which rarely happens).

In my opinion? RAG wins. The paradigm has never truly shifted-and at this point, I question whether it ever will.

Going back to automatic KG construction, I think it’s fair to say that, for now, it remains very much a toy-not unlike the smaller language models I’ve worked with, especially those under 7B and heavily quantized. While I fully support the push for smaller, local models, I believe they still need significant improvements before they can be reliably integrated into more elaborate KG systems.

Some key areas that need serious progress include:

  • Better instruction following – Many small models still struggle with nuanced multi-step reasoning.

  • More refined long-context understanding – Handling and synthesizing structured knowledge across thousands of tokens is still a challenge.

  • Robust function calling – Essential for grounding responses in external tools or structured logic.

Epilogue

I can’t help but reflect on something Yann LeCun once said-it has truly grown on me. As someone skeptical of LLMs, his critiques have started to resonate more with me the deeper I go into this field.

After working with open models outside of proprietary, commercial systems, especially in production settings, I’ve found myself asking: Is this really the direction we should keep pushing? Will the bubble eventually pop? The more I engage with KG construction, the clearer it becomes-LLMs and determinism do not belong in the same sentence. Yet determinism is at the very core of knowledge graphs.

If that’s the case, then I have to wonder-is the graph-based approach even aligned with where the field is headed? Given that much of the recent literature focuses on enterprise applications and chatbot integrations, is the future of structured knowledge in LLMs truly in graphs, or are we just forcing an outdated paradigm into an incompatible system?

Of course, this is just my opinion-I don’t claim to know what comes next. But one thing is certain: this experience has changed me. I’ve grown through these challenges, and while I remain uncertain about the field’s trajectory, I now see it with far more nuances than when I started.

As I wrap up this study, I find myself ready to move on-not just from this specific problem, but from the broader approach I’ve been taking. Through this experience, even if it has been frustrating at times, I’ve gained something invaluable: clarity.

With that, I think my next steps will be different. I want to return to the foundations, to something more concrete and logical-perhaps a shift toward mathematics, formal methods, or something with a stronger theoretical backbone. This experience has shown me that working “with” LLMs often feels like engineering, but working “on” them-that feels like research. And if I’m going to keep moving forward, I’d rather be on the side of understanding and shaping these systems, rather than just patching them together and hoping they hold.