The Data Privacy Vocabulary (DPV) is a resource produced by the W3C Data Privacy Vocabularies and Controls Community Group (DPVCG) to represent information associated with processing of (personal and non-personal) data and use of technologies in a machine-readable and interoperable manner.
DPV provides an ontology of concepts that enable expressing information such as data and technologies involved, their purposes and legal basis, measures used for security, relevant laws and rights, and associated risks and impacts.
DPV also provides taxonomies for these concepts based on real-world applications so that the machine-readable representations are consistent and interoperable through the use of DPV concepts.
Examples of how DPV can represent use-cases is:
- email (personal data) is being collected and stored (processing operation) for sending newsletters (purpose) based on the user's (data subject) consent (legal basis).
- SeeThis is a media streaming website (service) provided by CompanyX (service provider) that uses the viewing history (personal data) to personalise recommendations (purpose)
- The data required by SeeThis service is stored in servers technology located in Ireland and USA location & jurisdiction, and transferred between them (cross-border transfer).
- A border security agency uses CCTV (technology) images (personal data) to identify terrorist suspects (purposes) as authorised by national policy (law & jurisdiction). They utilise image recognition (AI capability) to identity people (technology function) and use machine-learning (AI technique) to keep their database updated. They periodically check for potential errors (consequence) and have procedures (organisational measure) to avoid inaccurate arrests (impact).
What's in the DPV?
DPV is the 'main' specification which provides the foundational framework upon which other 'extensions' are built. DPV contains the following concepts and taxonomies:
- Purposes e.g. Marketing, Service Provision, Compliance
- Processing operations e.g. Collect, Store, Use, Share, Delete
- Data e.g. Personal Data, Sensitive Data, Special Categories, Anonymised Data
- Technical Measures e.g. Encryption, Access Control
- Organisational Measures e.g. Notice, Policy, Assessments
- Legal Basis e.g. Consent - including types and status, Contract, Legal Obligation
- Context e.g. Location, Duration, Frequency, Necessity, Statuses
- Processing Context e.g. Automation, Human Involvement, Storage Conditions, Data Source
- Risk Assessment e.g. Risk and Mitigation Measure, Consequence and Impact
- Rights e.g. Data Subject Right, Rights Exercise, Rights Fulfilment and Non-fulfilment
- Rules e.g. Permission, Prohibition, Obligation
Extending these are the following extensions:
- Personal Data (PD) taxonomy with indication of Sensitive/Special Categories
- Locations (LOC) based on ISO 3166-2 for indicating Countries and Regions
- Risk Assessment and Management (RISK) concepts based on ISO 31000 series
- Technology (TECH) concepts to indicate Actors, Provision Method, Intended Use
- AI extends TECH with AI techniques, capabilities, lifecycle, risks, measures
- Justifications for explaining why something should be or cannot be done
- LEGAL concepts - laws, authorities, adequacy decisions from jurisdictions, e.g.
- Germany (DE)
- European Union (EU)
- United Kingdom of Great Britain and Northern Ireland (GB)
- Ireland (IE)
- India (IN)
- USA (US)
with the following specific laws defined in their own extensions:
- EU GDPR
- EU Data Governance Act (DGA)
- EU Network and Information Security Directive (NIS2)
- EU AI Act
- EU Fundamental Rights
How does DPV enable interoperability?
DPV uses the RDF and related semantic-web standards for defining concepts and creating data that is interoperable. Through this, each concept is given a unique identifier which enables its consistent representation across use-cases. For example, https://w3id.org/dpv#Purpose always refers to 'purpose' as a concept. Organisations directly using DPV have a consistent way to exchange and interpret the data in a consistent and interoperable manner.
Even if organisations may have differing internal terminology, they can be aligned by using DPV as a 'common' vocabulary. For example, CompanyA uses 'business purpose' as the term for what is 'purpose' in DPV, and CompanyB uses 'goal' as their term. If CompanyA and CompanyB want to exchange information, they can 'map' or 'align' their respective terms to DPV's 'purpose' so that the other entity can correctly understand it.
While the DPV uses RDF and semantic-web standards, this is not strictly necessary for the use of DPV. As long as the unique identifiers of DPV's concepts are retained, use-cases can use existing technologies to store and manage their information. For example, if a spreadsheet or a database stores a record of all data categories existing within the organisation, these can be annotated with DPV concepts to specify the category (e.g. sensitive personal data). Or an organisation can maintain a data dictionary mapping its internal terminology to DPV concepts so that interoperable records can be readily produced.
What can I do with DPV?
The most basic function of DPV is to represent information in a machine-readable form. For example, the ISO/IEC TS 27560:2023 technical specification uses DPV as an example of machine-readable consent records and receipts. Other forms of organisational records and documents can also be represented using DPV e.g. privacy notices, records of processing activities, risk and impact assessments, data breach records, and how/which cloud services are being used. DPV can also be used in a 'personal' capacity e.g. to indicate privacy preferences, maintain consent records, and exercise rights.
The 'hierarchical taxonomies' in DPV also support responsible use of data and technologies. For example, the purpose taxonomy includes the concept 'Personalisation' - which by itself is vague as it does not indicate what the personalisation is about or for. DPV taxonomies expand this concept to define different kinds of personalisations such as in service provision for personalised recommendations, which is separate from personalised advertising. By using such hierarchies, the most accurate purpose can be selected and indicated - thereby increasing transparency.
Such hierarchies also enable using a broader (but sufficiently clear) purpose such as 'service personalisation' to justify the different personalisation activities that can occur. For example, if consent is given to the 'broader' concept of service personalisation, then the further 'narrower' or 'specific' personalisation purposes in the hierarchy associated with events, products, and activities are also enabled through that consent.
How does DPV deal with jurisdictions/laws?
DPV itself is intended to be jurisdiction-agnostic - its concepts, though based on GDPR terminology, do not presume any particular law to be applicable. To indicate specific jurisdictions and laws are applicable, DPV provides explicit concepts to indicate this - hasJurisdiction and hasApplicableLaw. In addition to this, DPV uses the mechanism of 'extensions' - which are concepts defined in a separate namespace - to represent the concepts from different jurisdictions and laws. For example, legal-eu represents the EU jurisdiction and EU-GDPR extension represents the GDPR law within EU. Through this, DPV can support all laws and jurisdictions without overlaps between them.
Who is using DPV?
DPV is used by several academic and industrial research projects, commercial and non-commercial organisations, and researchers. See the DPVCG Adoption wiki page for more information.
Is the DPV 'free' to use?
DPV is provided under the W3C Software and Document license which permits use of DPV in all use-cases with acknowledgement in published work.
How do I get involved?
The DPVCG is a W3C community group, and has an open membership - anyone can join and is welcome to participate. Development happens in an open forum and is visible through the GitHub repo. If you are interested in using or are already using DPV, we strongly encourage you to join the DPVCG as decisions are taken based on membership, and participation also provides communication to suggest features and requirements, and to obtain assistance when/where needed.