About the Web of Biological Data
This is a prototype Web of Biological Data (WOBD), a research effort to make infectious and immune-related datasets and knowledge easier to find, query, reuse, and connect with other data sources in an AI-ready open knowledge network.
This work is supported by the U.S. National Science Foundation under award 2535091.
Relationship to Proto-OKN
The WOBD work extends ideas from the NSF Proto-OKN program: a publicly accessible, interconnected set of knowledge graphs and data services aimed at trustworthy, data-driven discovery. WOBD is conceived as a focused path within that broader fabric, connecting harmonized biomedical dataset metadata from the NIAID Data Ecosystem Portal with gene expression data from the Gene Expression Atlas, Wikidata, and other knowledge graphs so researchers can move from questions to datasets and related biological context in fewer steps.
Dataset metadata: the NIAID Data Ecosystem (NDE)
The primary structured metadata behind many graphs in this project comes from the NIAID Data Ecosystem Discovery Portal (NDE). The NDE aggregates and harmonizes dataset records from domain-specific and generalist repositories, clinical, epidemiological, multi-omic, and more into a unified search index with a Schema.org–aligned schema, filters for host, pathogen, condition, and technique, and an API for programmatic access. Records point back to their source repositories rather than replacing them; the value is consistent discovery across sources.
For a full description of the portal's design and scope, see the resource report: The NIAID Discovery Portal: A Unified Search Engine for Infectious and Immune-Mediated Disease Datasets (arXiv:2509.13524). Metadata harvested from the NDE pipeline is published as the NDE graph and loaded alongside other graphs in the federation used by this application.
Gene Expression Atlas (GXA) data
Data for the GXA graph is built from experiments in the EMBL-EBI Gene Expression Atlas (GXA), public differential-expression and functional studies. Experiment packages are retrieved from the Atlas infrastructure, parsed, and emitted as Biolink-compatible linked data (study metadata, contrasts, genes, and pathway enrichment) so users can run queries that span NDE dataset metadata, GXA expression results, and other knowledge graphs in the federation, combining and interpreting those results in one workflow.
FRINK registry and how things connect
Knowledge graphs are listed in the FRINK registry, each with a short name, title, description, and a link to its query endpoint. The FRINK SPARQL federation exposes those graphs so they can be queried, individually or in combination, depending on setup. The NDE graph is published in that ecosystem; the GXA graph is published similarly. At a high level, WOBD links NDE metadata (what datasets exist, how they are annotated, and where to get them) to the same query plane as curated biological knowledge (genes, diseases, drugs, pathways, expression contrasts) already present in FRINK, so templated SPARQL queries can span dataset discovery and mechanistic context without siloed portals.
This application
This site offers a template-based front end to that federated layer: each workflow is a predefined, validated SPARQL query pattern you can run with your own search terms or parameters. You choose a template, the app fills in the corresponding SPARQL and executes it against the registered graph endpoints so you can explore the NDE dataset layer and related biological knowledge in FRINK without writing queries by hand.
Team
- Trish Whetzel
- Ben Good
- Andrew Su
- Chris Bizon
- Ginger Tsueng
- Jim Balhoff
- Yaphet Kebede