Dataset Metadata Model

Scope

The dataset metadata model (DSMM) is based on requirements defined by the target stakeholder groups. The user requirements for the content were harmonized with the INSPIRE metadata regulation and implemented using Ecological Metadata Language (EML) metadata specification, version 2.1.1. EnvEurope focused on the EML metadata standard (stemming from the US LTER activities) but also provides an interface to discover ISO compliant metadata based on XSLT transformation EML2ISO developed in the project and Geonetwork opensource CSW implementation.

Document history

Version: DSMM 1.0 2016-08-24


Figure: UML class diagram of the DSMM - CLICK to open full image

Terms and definitions

DATASET (concept definition): a single data file or a series of data files which are described with one dataset metadata record. The data object documented can either be a physical file or a data service (e.g. WFS, WMS). For datasets metadata as specified in the DEIMS community profile (link to INSPIRE and EML) or the INSPIRE MD SpecificationSpecfication need to be provided. Read more ...

Notes: The datasets are linked to the Sites and the Data Products. The later link needs to be established with the next updated version of the DSMM.


1. DATASET TITLE
Element 1.1 Title
Definition Provides a name of the dataset that is being documented as is known within the community described in detail by following elements. Title is a characteristic, and often unique and is the most informative element of a metadata record and usually with the highest priority as search engines go to this element.
Recommendation & Hints Title has to be concise and precisely describing the point. It should not contain unexplained acronyms or abbreviations. It is recommended a maximum length of 200 characters and keeping the similarity with the original title of the dataset in the sense of the ‘official naming’ established in the community. If the dataset is part of a larger project, it is recommended to indicate the Project at the end of the title, in brackets. In case of Project names, abbreviations are allowed, as long as the rest of the title follows the guidelines above and the abbreviation is spelled out immediately in the abstract.
Format Free text
Multiplicity [1] - Single and mandatory element
Required mandatory
Example Precipitation measured near Lake Santo Parmense and Lake Scuro Parmense (1951-2010)
Reference List

2. DATASET IDENTIFIER
Element(s) 2.1 Site Name
Definition Value uniquely identifying a dataset. Its final form is composed of the Site Code retrieved from the Site Name and an unique identifier defined be proposed template.
Recommendation & Hints The identifier shall contain standardized codes of the Network and the Site that are derived from the Site Name. Additionally it should identify the dataset within your working environment (network, organization, laboratory, parameters observed, time coverage, etc.) with internal codes used. By typing the site name the system provides a list of research sites matching the query. If none available, it's strongly recommended to first create a site metadata using the editing form available and then return back to dataset metadata definition.
Format Text, Reference to [Content Type] SITE
Multiplicity [1..n]
Globally as single and mandatory element composed of 1 or more site names.
Required mandatory
Example Zone Atelier Seine - France
Reference List Reference list [Content Type] SITE

3. DATASET CREATOR & CONTACT POINTS
Element(s) 3.1 Dataset Owner/Creator
3.2 Dataset contact point
Definition Provides the full name of person(s), who created the dataset or who serve as contact points.
Recommendations & Hints By typing the name of a person, the system provides a list of people matching the query. If none available, it's strongly recommended to first create a person metadata using the editing form available and then return back to dataset metadata definition.
Format Text, Reference to [Content Type] PERSON
Multiplicity [1..n] for both Creator and Contact point
Required mandatory
Example Peterseil, Johannes
Oggioni, Alessandro
Reference List Reference to [Content Type] PERSON

4. DATASET METADATA PROVIDER
Element(s) 4.1 Metadata provider
Definition Provides the full name of person(s), who created the documentation of a dataset.
Recommendations & Hints By typing the name of a person, the system provides a list of people matching the query. If none available, it's strongly recommended to first create a person metadata using the editing form available and then return back to dataset metadata definition.
Format Text, Reference to [Content Type] PERSON
Multiplicity [1..n]
Required mandatory
Example Frenzel, Mark
Reference List Reference to [Content Type] PERSON

5. DATASET METADATA DATE
Element(s) 5.1 Date
Definition Provides date of metadata creation or last update.
Recommendations & Hints Is automatically generated and cannot be entered manually.
Format Date as YYYY-MM-DD
Multiplicity [1]
Required mandatory
Example 2014-12-18
Reference List n/a

6. DATASET PUBLICATION DATE
Element(s) 6.1 Date of publication
Definition Represents the date when the actual dataset was published on the LTER DEIMS. Any other maintenance activity is documented as a date of last revision.
Recommendations & Hints If the dataset has not been yet published due to e.g. quality validation checks, access rights definition, further content updates expected, etc. leave this field empty. A publication date can be typed as YYYY-MM-DD of taken from the pop-up calendar.
Format Date as YYYY-MM-DD
Multiplicity [1]
Required optional
Example 2016-08-18
Reference List n/a


7. DATASET LANGUAGE
Element(s) 7.1 Language
Definition The language in which the textual parts of the dataset are written.
Recommendations & Hints The names of parameters and their units collected within the dataset. Any other language used in textual information shall be referenced here as well. If the dataset does not contain any textual information (e.g. only codes and digits), the language should be defaulted to the value of the metadata language, which is defined as a default value - English.
Start typing the language name and the system will offer you options to select.
Format Reference (LOV) based on ISO 639 standardized nomenclature used to classify all known languages
Multiplicity [1..n]
Required optional
Example English
Reference List ISO 639 standardized nomenclature used to classify all known languages
  • aar|Afar
  • abk|Abkhazian
  • ace|Achinese
  • ach|Acoli
  • ada|Adangme
  • ady|Adyghe; Adygei
  • afa|Afro-Asiatic languages
  • afh|Afrihili
  • afr|Afrikaans
  • ain|Ainu
  • ...
8. DATASET ABSTRACT
Element(s) 8.1 Abstract
Definition Provide a brief overview of the dataset being documented.
Recommendations & Hints The dataset abstract is a succinct description that shall include:
  • A brief summary with the most important details that summarise the data aggregated in this dataset.
  • Coverage: linguistic transcriptions of the extent or location in addition to the bounding box.
  • Main attributes - e.g. values of the parameters X,Y within the time frame T1-T2, etc.
  • Data sources, Legal references.
  • Importance of the work.
  • Do not use unexplained acronyms.
  • Summarise the most important details in the first sentence or first 100 characters.
Format Free text
Multiplicity [1]
Required mandatory
Example Dataset provides monthly precipitation data collected at different weather stations that are closest to Lake Santo Parmense and Lake Scuro Parmense. We provide a long-term dataset (1951-2010) collected at two stations near the town of Bosco di Corniglio. In particular, from 1951 to 1998 precipitation data were collected at the weather station of Bosco Centrale. Between 1999 and july 2000 the station was out of order (no data available) and from 2001 the Bosco Centrale weather station was substituted by the Bosco di Corniglio station. The two weather stations are less than 1 km apart and they are about 6 km far from Lake Santo Parmense and 7 km far from Lake Scuro Parmense. We also provide a shorter dataset of precipitation data collected from 1994 to 2010 at the weather station of Lagdei. This station is closer to our sampling sites (about 1 km from Lake Santo Parmense and 5 km from Lake Scuro Parmense), but a shorter time series is available. All weather stations are managed by the the Environmental Protection Agency of the Emilia-Romagna region (ARPA Emilia-Romagna), which ...
Reference List n/a

9. DATASET KEYWORD SET
Element(s) 9.1 EnvThes Keywords
9.2 Free Keywords
Definition Provides a set of related keywords describing the content of the dataset derived from the controlled vocabulary implemented by EnvThes - thesaurus for long term ecological research, monitoring, experiments EnvThes and other environmentally related thesauri as EUNIS Habitats and INSPIRE Spatial Data Themes. Additional concepts can be defined as free keywords.
Recommendations & Hints The keywords from the following groups can be selected:
Format Reference to [Taxonomy] LTER Controlled Vocabulary and Free text.
Multiplicity [1..n]
Required EnvThes Keywords mandatory
Free Keywords optional
Example Permanent oligotrophic lakes
ponds and pools
microclimate
maps
ecosystem ecology
phenological stage
rainfall chemical analysis
organism classification
water properties
ecosystem processes
LTSER platform
compartment
Environmental monitoring facilities
Reference List Reference to [Taxonomy] LTER Controlled Vocabulary, which is regularyly updated

10. DATASET ACCESS AND USE CONSTRAINTS
Element(s) 10.1 Principal and granted permission
Definition Provide a list of rules defining permissions granted for a dataset.
Recommendations & Hints It is recommended that the 3-year rule be implemented and that the data owner must take specific actions to change this.
Format Predefined list of user groups and permission granted to them.
Multiplicity [1..n]
Required optional
Example Research >> Free for access
Public >> Other restrictions according to rules defined in intellectual rights
Education >> Free for access and use upon request
Reference List Use constraints are defined for the following User Groups:
  • Administration
  • Education & Training
  • Public
  • Research
  • LTER-Europe
  • Others

Use constraints are defined by the following Permissions:

  • Free for access
  • Free for access and use upon request
  • Other restrictions according to rules defined in itellectual rights
  • Restricted access detaily defined in intellectual property information
  • No access

11. DATASET INTELLECTUAL RIGHTS
Element(s) 11.1 Intellectual rights
Definition Intellectual Rights provides a list of rights management statements for the dataset, or reference a URL (web address) that provides such information. Rights information encompasses Intellectual Property Rights (IPR), copyright, and various property rights. Moreover these rights might include detailed requirements for use, requirements for attribution, or other requirements the owner would like to impose.
Recommendations & Hints Select an option from the list, if any matching. If none matching, use an option Other IPR and provide free textual description, or if IPR information available on an online source, paste the URL pointing to this source. For multiple selection use the CTRL button.
Format Predefined list of IPR statements, and Other IPR for user defined free text statements.
Multiplicity [0..n] - Metadata is optional (provided if necessary)
Required optional
Example Mutual agreement on reciprocal sharing of data
The opportunity to collaborate on the project using the dataset
Formal acknowledgement of the dataset providers
Reference List
  • Co-authorship on publications resulting from use of the dataset
  • The data provider must be offered co-authorship for publications using this dataset at least within the metadata description
  • Formal acknowledgement of the dataset providers
  • The opportunity to collaborate on the project using the dataset
  • At least part of the costs of dataset acquisition, retrieval or provision must be recovered.
  • The opportunity to review the results based on the dataset
  • Reprints of articles using the dataset must be provided to the data provider
  • The dataset provider is given a complete list of all products that make use of the dataset
  • Legal permission for dataset use is obtained
  • Mutual agreement on reciprocal sharing of data
  • The data provider is given and agrees to a statement of uses to which the dataset will be put
  • Other rights

12. DATASET ONLINE DISTRIBUTION
Element(s) 12.1 Online Locator
|-- 12.1.1 Distribution function
|-- 12.1.2 Distribution URL
|---- 12.1.2.1 Web adress Title
|---- 12.1.2.2 Web adress URL
|-- 12.1.3 Email
Definition Web address is the "navigation section" of a metadata record pointing users to the location (URL) where a dataset can be retrieved directly, or provides information about how to acquire a dataset.
Recommendations & Hints Setting up the correct resource locators is important for the connection between the data and the services that provide access to them or for providing additional information concerning the resource. If Web address for dataset is available, the Dataset Locator shall be a valid URL providing one of the following: a link to a web page with further instructions; a link to a web service capabilities document; a link to a client application (web data portal) that directly accesses dataset. If a dataset is available offline, it may be uploaded into the system and made online available with access and use constraints and IPR defined previously.
Format Distribution function: Text (Reference list)
Web address title Text(255)
Web address URL Valid URL
E-Mail Valid email adress
Multiplicity [0..n] for Dataset locator
Metadata is conditional (shall be provided if not Dataset file element provided)
Required optional
Example Example Data Service (SOS)
Web address function: Access to the dataset by LTER Europe FTP, SOS or Linked Data service
Web address title: LTER Europe Sensor Observation Service (SOS)
Web address URL: http://sp7.irea.cnr.it/tomcat/envsos/sos?REQUEST=getcapabilities&service...
Example Referenced Data File
Web address function: Access to the dataset by another service or data portal
Web address title: B2SHARE Link
Web address URL: http://hdl.handle.net/11304/85d26a94-eac2-11e5-9bb4-2b0aad496318
Example reference to E-Mail
Web address function: email address
Email: johannes.peterseil@umweltbundesamt.at
Reference List Reference list for Web adress function
  • N/A
  • Information about the dataset
  • Access to the dataset by LTER Europe FTP, SOS or Linked Data service
  • Access to the dataset by another service or data portal
  • email adress

Note: The reference list for the 'web adress function' needs to be discussed (e.g. GetCapabilities, GetMap (OGC WMS), GetFeature (OGC WFS), GetObservation (OGC SOS), DescribeSensor (OGC SOS)). Reference list should be detailled according to that.

Element(s) 12.2 WMS Related
|-- 12.2.1 WMS Map
|-- 12.2.2 WMS Map Web Address
|---- 12.2.2.1 Web adress Title
|---- 12.2.2.2 Web adress URL
Definition The WMS Map Web address is the "navigation section" to an of a metadata record pointing users to the location (URL) where a (optional) OGC WMS representation of the dataset can be retrieved directly.
Recommendations & Hints The WMS Map Web Adress provides an additional link to a OGC WMS representation of the dataset. This can specifically be done, or using the basic 'Online Resource Locator'.
Note: The field needs to be discussed for the next version of the DSMM.
Format WMS Map Image (File upload, jpg, jpeg, png, gif) Max. 2 MB
Web Adress Title Text(255)
Web Adress URL Valid URL to the OGC WMS GetMap request
Multiplicity [0..1]
Required optional
Example Web address title: OGC WMS Example Map
Web address URL: http://giswebservices.massgis.state.ma.us/geoserver/wms?VERSION=1.1.1&RE...
Reference List n/a

13. DATA SOURCES
Element(s) 13.1 Name
13.2 Description
13.3 Source
|-- 13.3.1 File upload
|-- 13.3.2 Header Lines
|-- 13.3.3 Footer Lines
|-- 13.3.4 Orientation
|-- 13.3.5 Quote character
|-- 13.3.6 Field delimiter
|-- 13.3.7 Record delimiter
Definition A data source is defined as a physical data upload of a dataset to DEIMS. A data source is described by a number of fields, describing the structure of the data file.
For a dataset MD record [0..n] data files can be physically uploaded and described.
Recommendations & Hints The concept of the data source is used, when data files are physically uploaded to DEIMS. This is not recommended. Please used the 'online distribution link' instead, pointing to an online resource, e.g. file uploaded to B2SHARE.
Format [Content Type]
Multiplicity [0..n]
Required optional
Example data file
Reference List n/a

14. GEOGRAPHIC (RESEARCH SITE)
Element(s) Entity type Research site as geographic reference
14.1 Name of Research Location
14.2 Description
14.3 Research Location ID
14.4 Related Site
14.5 Geographic Location
|-- 14.5.1 North bound coordinate
|-- 14.5.2 South bound coordinate
|-- 14.5.3 West bound coordinate
|-- 14.5.4 East bound coordinate
|-- 14.5.5 Maximum Altitude
|-- 14.5.6 Minimum Altitude
14.6 Images
14.7 Details
Definition The Geographic reference for the dataset is done by the entity type Research site, which is the location, where specific observations are done. By grouping the information the entity type RESEARCH SITE can be reused. A RESEARCH SITE are the observtion plots within a SITE (e.g. LTER SITE or LTSER PLATFORM).
The RESEARCH SITE is the extent of the dataset in the geographic space, given as a bounding box. Defining the coordinates of a boundary rectangle representing the dataset area on a map allows the discovery by geographical area. It consists of: Northern bound coordinate of the limit of the dataset extent, expressed in latitude in decimal degrees (positive north), Southern bound coordinate of the limit of the dataset extent, expressed in latitude in decimal degrees (positive north), Western bound coordinate of the limit of the dataset extent, expressed in longitude in decimal degrees (positive east) and Eastern bound coordinate of the limit of the dataset extent, expressed in longitude in decimal degrees (positive east).
In addition the bounding altitudes, images and additional details can be provided.
Recommendations & Hints Please select a Research site from the list, or create a new one with 'Create Research Site' if missing. For the Research Site the bounding box shall be as small as possible and shall be expressed in decimal degree with a precision of at least two decimals. The coordinates of the bounding boxes shall be expressed in any geodetic coordinate reference system with the Greenwich Prime Meridian. You can define bounding box either by drawing a polygon on a map or inserting manually the bounding coordinates. To activate drawing functionality and to (move, delete, reshape) use the editing toolbar in the right top corner of the map.
If you would like to define more bounding coordinates, please add more RESEARCH SITES for the dataset MD record.
Format Reference to [Content Type] RESEARCH SITE
Multiplicity [1..n]
Required mandatory
Example Zöbelboden_IP1, Zöbelboden_IP2, Zöbelboden_IP3
Reference List Reference to [Content Type] Research Site

15. DATASET TEMPORAL EXTENT
Element(s) 15.1 From date
15.2 To date
Definition Defines the time period covered by the content of the dataset. This period may be expressed as a time (an individual date) or date ranges (interval of dates/From-To) or a mix of individual and interval dates.
Recommendations & Hints
Format Date
Multiplicity [1]
Required mandatory
Example From date: 01/01/2014
To date: 12/31/2014
Reference List n/a

16. DATASET TAXONOMIC COVERAGE
Element(s) 16.1 Biological Classification
Definition Provides information about the taxonomic classification of the organisms represented in the dataset.
Recommendations & Hints This field is applicable only for biotic data, e.g. when biotic diversity has been chosen in the Keyword tab.
Depending on the content of the dataset, provide information about the most common level of taxonomy aggregation (e.g. plants: family, marine invertebrates: phylum or class, etc.)
Recommendation is to use common catalogue of species, for example, GBIF, or EUNIS.
Format Reference to [Taxonomy] Biological Classification
Multiplicity [0..n] Metadata is conditional (shall be provided for biotic datasets)
Required optional
Example Fagus sylvatica L.
Reference List Reference to [Taxonomy] Biological Classification

17. DATASET METHODS DESCRIPTION
Element(s) 17.1 Method Online Reference
|-- 17.1.1 Web address title
|-- 17.1.2 Web address URL
17.2 Description
Definition Provides repeated sets of elements that document a series of procedures followed to produce any dataset object.
Recommendations & Hints If the method description is available online (e.g. previous research resulting in new methodologies, guidelines, specifications, and standards) provide the Method title and URL pointing to the description. This information shall include information about procedure steps, software used within individual steps, source data and any quality measures taken. All information included here should help future data user to evaluate and understand more about the dataset content, thus allow the user to determine whether he/she would be able to combine the dataset within his/her workflows.
If not online resource available provide the Method title and comprehensive description.
References can be found in Method and related concepts available in EnvThesis provided.
Format Web address title Text(255)
Web address URL Valid URL
Description Text
Multiplicity [1..n]
Required mandatory
Example Web address title: permanent plots
Web address URL: http://vocabs.lter-europe.net/EnvThes/USLterCV_409
Description: long-term sample locations used for measuring/estimating biophysical parameteres of the environment.
Reference List Method and related concepts available in EnvThes

18. DATASET INSTRUMENTATION DESCRIPTION
Element(s) 18.1 Instrumentation
Definition Provides information about any instruments used in the data collection or quality control and quality assurance.
Recommendations & Hints Instrumentation is a textual description of the used devices including the parameter observed. The description should provide information about:
  • parameter
  • device type
  • device brand and type number
  • producer: company or country
  • additional notes

Use the instruments already available in the system through autocomplete functionality if applicable.

Format Free text
Multiplicity [0..n] Metadata is mandatory
Required optional
Example Reversing thermometer associated with sampling bottle.
Reference List

19. DATASET SAMPLING DESCRIPTION
Element(s) 19.1 Representative area of sampling
|-- 19.1.1 Spatial scale
19.2 Sampling frequency
|-- 19.2.1 Sampling time span
|-- 19.2.2 Minimum sampling unit
Definition Provides information about sampling part of the method as measurement frequency, and spatial scale.
Recommendations & Hints Select the value for spatial scale, sampling time and minimum sampling unit from the available lists.
If available lists do not contain required value, use the last option Other and define new value in the box appeared below.
Format Predefined set of values extracted from ECOPAR
Free text for option Other
Multiplicity [1..n] for Spatial scale - metadata is mandatory
[1] for Sampling time span - metadata is mandatory
[1] for Minimum sampling unit - metadata is mandatory
Required mandatory
Example Spatial scale: 10x50M (Line Transects)
Sampling time span: Weekly
Minimum sampling unit: 30min
Reference List Predefined values defined in ECOPAR.

20. QUALITY ASSURANCE
Element(s) 20.1 Quality assurance
Definition Provides information on QA/QC procedures applied for the data.
Recommendations & Hints Please provide information on QA/QC procedures applied for the data and quality information in the data.
Format Free text
Multiplicity [0..1] Metadata is optional
Required optional
Example Values are quality checked using automatic outlier control (R-Script) as well as visual inspection of data. The QA/QC procedure is applied ...
Reference List n/a

21. DATASET LEGAL OBLIGATION REPORTING
Element(s) 21.1 Legal act
Definition Provides information whether the dataset has been reported to the local, regional or national bodies to fulfil the obligations from particular legal regulations.
Recommendations & Hints Select from list of EU directives.
For national or regional directives use option Other and provide references.
Even if the element is optional it is recommended to provide information linking to related policies through the regulations.
Format Predefined list of relevant EU directives.
Free text for option Other
Multiplicity [0..n] Metadata is optional
Required
Example Water Framework Directive (00/60/EEC).
Reference List
  • Habitats Directive (92/43/EEC)
  • Water Framework Directive (00/60/EEC)
  • Bird Directive (79/409/EEC)
  • Marine Strategy Framework Directive
  • Water Policy Directive
  • None
  • Other directive

Powered by Tables Generator