JSON Trouble on the Default-lines
Engineering

JSON Trouble on the Default-lines

·7 min read

Note: I leaned heavily on Google Gemini for this investigation, blog post, and the necessary blog post illustration.

We've been refining our configuration system using JSON schemas as the source of truth. The goal: let users provide a sparse config.yaml and have our application fill in all the defaults automatically. What we discovered about how different languages handle JSON Schema defaults was illuminating.

The Expectation

The main library we're using is the json_schemer gem with its insert_property_defaults: true option. We considered the more popular json-schema gem (500+ million downloads vs json_schemer's 80 million), but it's stuck on JSON Schema draft-05 and no longer actively maintained. Multiple GitHub issues confirm draft-06+ support remains incomplete despite years of requests.

We expected json_schemer would take our JSON schema (complete with default keywords) and minimal user config, then produce a fully populated configuration hash. e.g. if our schema defined a site section with host defaulting to "localhost:3000", even an empty user config should result in site: { host: "localhost:3000", ... }.

The Problem

Simple top-level defaults worked. Defaults within existing, valid config sections worked. But the deeper "scaffolding" – creating missing nested objects and filling their defaults – didn't happen comprehensively.

The issue: interaction between default: {} (indicating an object should be created if missing) and the required keyword. If an object was created (e.g., site: {}) but immediately failed validation due to missing required properties, json_schemer halted default-filling for properties within that invalid object.

The Experiment

To determine if this was json_schemer-specific, we tested across three environments:

  1. Ruby with json_schemer
  2. Node.js with ajv
  3. Python with jsonschema

Test schema structure: nested objects with default: {} and required fields at multiple levels.

{
  "type": "object",
  "properties": {
    "config_section": {
      "type": "object",
      "default": {},
      "properties": {
        "setting1_with_default": { "type": "string", "default": "default_for_setting1" },
        "setting2_required_no_default": { "type": "boolean" },
        "nested_object": {
          "type": "object",
          "default": {},
          "properties": {
            "deep_setting_with_default": { "type": "integer", "default": 42 },
            "deep_setting_required_no_default": { "type": "string" }
          },
          "required": ["deep_setting_required_no_default"]
        }
      },
      "required": ["setting2_required_no_default", "nested_object"]
    },
    "top_level_prop_with_default": {
      "type": "string",
      "default": "default_for_top_level"
    }
  }
}

Ruby Results

With empty input:

Data AFTER validation:
{"config_section" => {}, "top_level_prop_with_default" => "default_for_top_level"}

Validation FAILED. Errors:
  1. Path: /config_section, Error: required, Details: {"missing_keys" => ["setting2_required_no_default", "nested_object"]}

config_section was created as {}, but setting1_with_default and deep_setting_with_default were not applied. The required failure stopped the cascade.

Node.js Results

ajv with useDefaults: true behaved differently:

Data AFTER validation:
{
  config_section: {
    setting1_with_default: 'default_for_setting1', // Applied!
    nested_object: { deep_setting_with_default: 42 }    // Applied!
  },
  top_level_prop_with_default: 'default_for_top_level'
}

Validation FAILED. Errors: [required field errors]

ajv scaffolded the nested structure with defaults first, then reported validation errors.

Python Results

With custom validator extension, Python mirrored ajv's behavior: defaults applied before required errors were flagged.

The Difference

json_schemer: Validates required constraints before filling defaults within invalid objects.

ajv / Python: Apply defaults globally first, then validate.

Both approaches are valid per the JSON Schema specification, but json_schemer's behavior meant our strategy wouldn't work.

Our Solution

We're adopting a "Deep Merge" strategy:

  1. Generate Full Defaults: Traverse our schema to build a complete Ruby hash with all defaults applied
  2. Load User Config: Read the sparse config.yaml
  3. Deep Merge: User settings override defaults, but all defaults are present
  4. Validate: Pass the complete hash to json_schemer for type/format/constraint validation

This ensures predictable, fully defaulted configuration regardless of how sparse the user's input is.


Appendix: Test Scripts

For those interested in replicating these tests, here are the test scripts.

Ruby (json_schemer_test.rb):

Node.js (ajv_test.js):

Python (python_jsonschema_test.py):