Per-field breakdown
The headline "% with any IPTC" hides a lot of detail. Below is the population rate of each IPTC field across the 6,845 images analysed in the latest crawl. Field names link to their definitions in the IPTC Photo Metadata Standard. The "scored" badge marks the four fields that contribute to a site's overall score (the "Four Cs" of news photo provenance); the others are tracked for context but don't affect the score — see the methodology for why.
| Field | Type | Present | Total | Population | Score weight |
|---|---|---|---|---|---|
Creator | scored | 479 | 6,766 | 7.1% | 25% |
DateCreated | tracked | 462 | 6,766 | 6.8% | — |
CreditLine | scored | 379 | 6,766 | 5.6% | 25% |
Copyright | scored | 351 | 6,766 | 5.2% | 25% |
ObjectName | tracked | 315 | 6,766 | 4.7% | — |
CaptionDescription | scored | 296 | 6,766 | 4.4% | 25% |
Source | tracked | 259 | 6,766 | 3.8% | — |
LocationCreated | tracked | 221 | 6,766 | 3.3% | — |
Keywords | tracked | 172 | 6,766 | 2.5% | — |
WebStatement | tracked | 108 | 6,766 | 1.6% | — |
LicensorURL | tracked | 80 | 6,766 | 1.2% | — |
DataMining | tracked | 46 | 6,766 | 0.7% | — |
UsageTerms | tracked | 44 | 6,766 | 0.7% | — |
Genre | tracked | 18 | 6,766 | 0.3% | — |
LicensorName | tracked | 2 | 6,766 | 0% | — |
DigitalSourceType | tracked | 1 | 6,766 | 0% | — |
AIPromptInformation | tracked | 0 | 6,766 | 0% | — |
AIPromptWriterName | tracked | 0 | 6,766 | 0% | — |
AISystemUsed | tracked | 0 | 6,766 | 0% | — |
AISystemVersionUsed | tracked | 0 | 6,766 | 0% | — |
AltTextAccessibility | tracked | 0 | 6,766 | 0% | — |
ExtendedDescriptionAccessibility | tracked | 0 | 6,766 | 0% | — |
LocationShown | tracked | 0 | 6,766 | 0% | — |
| Total | 100% |
Bars are scaled to the most populated field, not to 100%, so differences within the
long tail stay visible. The score weight column reflects the per-field weights
in config/scoring.yaml: the four scored fields are weighted 25% each.
Tracked fields show "—" because their presence does not contribute to the score.
What this tells us
Even the best-populated field — Creator at
7.1% — appears on fewer than one in ten images we fetched.
The fields most relevant to licensing (WebStatement, LicensorURL)
are below 3%. The dataset as a whole tells a consistent story: publishers and CDNs are
discarding metadata that the photographers and agencies almost certainly embedded
upstream.