Rethinking the Human Core in a Machine World
STORY INLINE POST
The World Economic Forum's “Future of Jobs Report 2025” forecasts that 39% of core worker skills will change by 2030, driven by automation, AI, and climate disruption.
These "future skills" transcend traditional job competencies. They are layered, interdependent, and distributed across three domains:
-
Technological: AI literacy, data fluency, cybersecurity
-
Cognitive: Critical analysis, creativity, problem-solving
-
Socio-emotional: Empathy, adaptability, collaboration
These skills cannot be meaningfully assessed through traditional exams or standardized testing. They require dynamic, contextual, and often collaborative evaluations. This is where AI benchmarks, paradoxically, become a new reference for assessment of intelligence, helping us understand which skills machines replicate, and which remain uniquely human.
Humanity’s Last Exam (HLE): A Catalyst for General Intelligence
Developed by the Center for AI Safety and Scale AI, HLE is a 2,500-question benchmark across 100-plus disciplines. Designed to challenge even the most advanced large language models (LLMs), HLE pushes beyond memorization into deep reasoning, multistep problem-solving, and multimodal integration.
The new Grok 4's current score (about 45% for Heavy variant) shows rapid improvement from under 10% just months prior — but still falls short of human Ph.D.-level performance (about 50–60%).
But HLE’s true value lies in highlighting what AI finds hard, thereby guiding what humans should focus on: interpretation, moral framing, interdisciplinary thinking, and intuition under ambiguity.
In a competitive test of elite benchmarks — graduate science tests, math competitions, real-world programming, math tournament — Grok 4 Heavy, aided by Python execution, reached scores above 90% in most domains!
That means Grok 4 Heavy performed at or above the level of top human experts in many of these elite knowledge fields, where an exceptional human may barely aspire to achieve in a single one!
For example:
-
Solving nearly all problems on the AIME math test (even getting 100% in some cases).
-
Achieving the maximum possible score in coding and structured reasoning benchmarks.
This shows that Grok 4 Heavy is approaching expert-level performance in solving highly specialized, technically demanding problems concurrently in all fields of knowledge.
In essence, AI shines where rules are known; humans shine where rules must be invented.
If AI now outperforms professionals in standard evaluations, what does this mean for human testing systems?
We are living through a collapse of assessment. Exams built to test recall, procedural fluency, or even logical reasoning are increasingly gameable — not just by AIs, but by humans coached to mimic machine-like behavior.
Rather than viewing AI as a competitor, assessment can reframe AI as a partner in learning and growth.
Three emerging paradigms capture this potential:
1. AI as Mentor
Students interact with AI tutors to co-solve problems, critique AI’s hallucinations, and refine prompts.
The skill being assessed? Curated curiosity and intelligent skepticism.
2. Agentic Assessment Networks
AI-driven simulations challenge students in real time, adjusting complexity and roles dynamically. Imagine a team navigating a climate crisis in a simulated environment — while being evaluated on resilience, ethical trade-offs, and systems thinking.
3. Reflexive Ecosystems of Evaluation
Combining human judgment, peer reviews, and AI-audited trails, schools and organizations can build transparent, ethical, multiperspective evaluative frameworks. This aligns with the WEF’s vision of lifelong, inclusive, and personalized learning.
AI may be able to write essays, solve equations, and debug code, but it still struggles to understand why something matters.
It can calculate, but not care. It can analyze, but not agonize. It can advise, but not empathize.
That’s why future assessments must elevate:
-
Ethical reasoning
-
Narrative construction
-
Community impact
-
Emotional regulation
-
Ecological and planetary thinking
These are not just “soft skills.” They are existential skills — those that enable us to stay human in the age of machines.
In this new world, the question is no longer what AI can do. The more urgent question is: How should humans assess themselves, each other, and the skills needed to thrive in a world increasingly co-authored by machines?
At the same time, education itself is undergoing a radical shift. Driven by the forces of open source, open AI, open code, and open innovation, learning ecosystems are evolving from static, centralized models to dynamic, decentralized architectures.
Under this scenario, educators become Chief Curators of Learning, selecting, integrating, and governing open tools and AI tutors. Institutions shift from owning content to enabling ecosystems. Assessment becomes multimodal, contextual, and community-informed.
With roles like AI Pedagogy Engineers and Ecosystem Architects emerging, assessment is no longer about grading static answers, it’s about facilitating dynamic, lifelong learning journeys. In this transformation, learners are not passive subjects but co-authors of the future.
Assessment is no longer a one-way process. It is a conversation between humans and machines, between institutions and learners, between the present and the possible.
AI provides unprecedented intelligence. But open education provides the values, trust, and architectures needed to make that intelligence meaningful. If we design wisely, we will not only assess better, we will learn better, together.








By Fernando Valenzuela Migoya | President -
Thu, 07/17/2025 - 06:30


