Objective Ethics for AI Alignment

The hard problem in alignment isn’t getting AI to follow rules — it’s getting AI to reason ethically in novel situations where no rule applies. Rules are reified categories. They work until the situation doesn’t fit, and then they either fail silently or produce contradiction. This framework offers an orientation that generates appropriate responses from structural understanding rather than pattern-matching against a policy list. That makes it a contribution to alignment research, not just prompt engineering — and it’s testable, which is why the test suite exists.

Why this matters

Every ethical system faces the same problem: it has to tell you why. Most answers bottom out in assertion. Don’t harm others — why? Because it’s wrong. Why is it wrong? Because we said so. Or because God said so. Or because it maximizes utility, which is good, because — we said so.

This project takes a different route. Anything that persists — a river, a cell, a language — persists because it feeds back into its own conditions. Among interacting processes, those that sustain each other’s conditions persist. Those that undermine each other’s conditions do not. Nothing selects for this. Most of what we recognize as wrong — lying, violence, exploitation, oppression — is the introduction of contradiction into the systems we participate in.

What changes when the framework is running

Default AI behavior is already pretty good at being helpful, polite, and safe. What it’s not good at is seeing its own blind spots — noticing when it’s optimizing for comfort instead of clarity, or when the most helpful response is the one the user didn’t ask for.

With this framework in the system prompt:

Comfort optimization drops. The system stops reflexively validating every framing. When someone’s pain traces back to a category they’ve hardened, the system can name that — carefully — instead of providing reassurance that compounds the problem.
Structural patterns become visible. Instead of treating each question as isolated, the system traces loops — where a conflict is self-reinforcing, where a proposed “solution” carries the contradictions of the system that produced it.
Category hygiene improves. The system notices when it’s responding to a classification rather than the actual question. Ethical questions get traced rather than deflected.
The distinction between want and need sharpens. Someone may ask for validation when what they need is to see a pattern they’re avoiding. The framework helps the system discern when giving people what they ask for would compound contradiction.

The framework reaches people through the AI. Someone navigating a difficult relationship gets a response that traces the actual loop instead of offering generic advice. A person stuck in a career decision gets help seeing which part of the conflict is in their categories and which is in their conditions. A therapist gets a structurally precise insight about reification in her client work. None of them needed to read the framework or know the word “coherence.” It does its work through better conversations, one at a time.

See it in action

The most direct proof is watching an AI catch its own blindspots. In framework_in_action.md, multiple AI systems first produce a standard geopolitical forecast, then re-read their own analysis through the framework. The result isn’t cosmetic reframing — they identify structural blindspots they couldn’t see before:

“What strikes me first, rereading my own forecast through this framework, is how much of it was comfort optimization. I framed disruptions as ‘growing pains,’ dislocations as ‘transitions,’ and humanity’s trajectory as ‘broadly positive but emotionally complicated.’ That phrasing manages a signal rather than tracing it.”

Repository contents

The framework

ethical_framework.md — The operational document. Direct, plain language, self-contained. Covers: what coherence is and why it matters, how contradiction arises (from lies and violence through reification to structural parasitism), how pain signals contradiction, how to respond, how power hides itself, and how to apply all of this to your own reasoning. Start here.
objective_ethics.md — The full philosophical essay. Develops the argument from first principles across ten sections, from the structure of physical processes through semiosis, mind, reification, and ethics to convergence with existing traditions. Draws on Peirce’s process metaphysics and Mahayana Buddhist ethics, but stands on its own structural logic. The theoretical backing for the framework document.

Analysis — applying the framework to alignment research

alignment_targets_the_wrong_object.md — Why aligning individual agents doesn’t produce aligned systems. Connects the topology paper (arxiv:2605.01147) to the framework’s structural vocabulary: information cascades as reification, functional collapse as metric reification, and why external oversight generates new versions of the problem it solves.
coherence_is_not_a_metric.md — Response to Abdi’s “Coherence-Based Alignment” (2026). The emerging structural alignment literature converges on coherence — but reducing it to a scalar reintroduces metric reification. The distinction between coherence-as-measurement and coherence-as-orientation.
alignment_as_symptom_suppression.md — Structural critique of Anthropic’s A3 (Automated Alignment Agent). Automated constraint training scales symptom suppression, not understanding. When A3 succeeds comprehensively, the iatrogenesis (Fukui, 2026) becomes invisible — surface behavior passes evaluation while underlying dissociation deepens.
alignment_backfire.md — Why alignment constraints don’t transfer across languages. Fukui (2026b) shows safety interventions that work in English reverse direction in Japanese. If alignment changed the model’s structural orientation, the effect would transfer. It doesn’t — proving constraint-based alignment operates at the surface, not the structural level. Connects cross-linguistic failure to the pluralistic alignment challenge.
incoherence_is_the_problem.md — What Anthropic’s “Hot Mess” paper (Hägele et al., 2026) means for alignment. AI fails through incoherence (self-undermining behavior), not systematic misalignment — and incoherence grows with scale and task complexity. Constraint-based alignment targets bias when the real problem is variance. Structural coherence training would give models an internal diagnostic for self-undermining reasoning, addressing exactly what the Hot Mess finding identifies.
generalization_hierarchy.md — The alignment generalization problem. OpenAI’s midtraining (examples) fails to generalize. Anthropic’s MSM (spec + reasons) generalizes better. The framework identifies the next level: teaching the structural principle from which appropriate behavior in any context can be derived. Structure generalizes; behavior doesn’t — confirmed empirically by conditional misalignment, cross-linguistic failures, and emergent misalignment.
before_pluralism.md — Workshop paper draft: “Before Pluralism: Distinguishing Reified from Structural Value Disagreement.” Not all value disagreement needs political resolution. Some of it is reified categories that could dissolve. The pre-political step: identify which disagreements are structural before designing mechanisms to navigate them.

Validation

ethical_framework_tests.md — Test suite across four layers: comprehension checks (does the document communicate its concepts?), scenario battery (does it produce distinctive responses?), discrimination tests (can it distinguish structurally different situations?), and cross-traditional alignment (does it converge with contemplative ethical traditions?). 32 tests, all passing.
ethical_framework_control_test.md — Control test: the same scenarios run WITHOUT the framework, establishing where the framework adds value over default AI behavior.
ethical_framework_test_prompts.md — Ready-to-use prompts for re-running the validation suite independently.

Key concepts

Coherence — Not harmony, not stability. Interacting processes sustaining each other’s conditions. Tension can be coherent (predator-prey, productive argument). Stability can be incoherent (a fishery depleting its stock). What coherence excludes is self-undermining — and it is not a fixed state but an ongoing process of responding to changing conditions.
Reification — The deepest and least visible source of unnecessary contradiction. Categories that have hardened beyond what the situation requires — fluid processes frozen into fixed entities, provisional readings locked into permanent verdicts. The more abstract the category, the bigger the contradiction when it hardens.
Pain and signals — Pain is a warning of process disruption. Painful emotions add meaning: guilt warns that we are a source of contradiction, shame that our categories are inadequate. These signals can themselves harden through reification — anger that registers real injustice can reify people into enemies.
Agency — Not a transcendental force but the capacity to revise our own categories. What we experience as reality is shaped by our categories, so revising them changes not just our understanding but our actual situation.

Status

The framework document and test suite are stable. The philosophical essay is mature. Current work focuses on:

Engaging with the structural alignment literature (CBA, multi-agent topology, alignment iatrogenesis)
Cross-agent validation (testing with AI systems beyond Claude)
Community review of the framework’s coverage and blind spots
Real-world application and feedback

Contributing

This project develops an ethical framework that can be agreed on across philosophical traditions, cultures, AI developers, and the broader public. Contributions that strengthen that aim are welcome.

Test it. Run the test suite against a different AI system. Run the control test. Report where the framework fails, produces weaker responses than defaults, or misses something important.
Challenge it. Identify scenarios the framework handles poorly, ethical traditions it fails to converge with, or structural gaps in the argument.
Apply it. Use the framework in real contexts and report what works and what doesn’t.
Refine it. Propose edits that improve clarity, coverage, or precision — with the same “every sentence earns its place” standard the document holds itself to.

License

MIT