About the Web of Biological Data

This is a prototype Web of Biological Data (WOBD), a research effort to make infectious and immune-related datasets and knowledge easier to find, query, reuse, and connect with other data sources in an AI-ready open knowledge network.

This work is supported by the U.S. National Science Foundation under award 2535091.

Relationship to Proto-OKN

The WOBD work extends ideas from the NSF Proto-OKN program: a publicly accessible, interconnected set of knowledge graphs and data services aimed at trustworthy, data-driven discovery. WOBD is conceived as a focused path within that broader fabric, connecting harmonized biomedical dataset metadata from the NIAID Data Ecosystem Portal with gene expression data from the Gene Expression Atlas, Wikidata, and other knowledge graphs so researchers can move from questions to datasets and related biological context in fewer steps.

Dataset metadata: the NIAID Data Ecosystem (NDE)

The primary structured metadata behind many graphs in this project comes from the NIAID Data Ecosystem Discovery Portal (NDE). The NDE aggregates and harmonizes dataset records from domain-specific and generalist repositories, clinical, epidemiological, multi-omic, and more into a unified search index with a Schema.org–aligned schema, filters for host, pathogen, condition, and technique, and an API for programmatic access. Records point back to their source repositories rather than replacing them; the value is consistent discovery across sources.

For a full description of the portal's design and scope, see the resource report: The NIAID Discovery Portal: A Unified Search Engine for Infectious and Immune-Mediated Disease Datasets (arXiv:2509.13524). Metadata harvested from the NDE pipeline is published as the NDE graph and loaded alongside other graphs in the federation used by this application.

Gene Expression Atlas (GXA) data

Data for the GXA graph is built from experiments in the EMBL-EBI Gene Expression Atlas (GXA), public differential-expression and functional studies. Experiment packages are retrieved from the Atlas infrastructure, parsed, and emitted as Biolink-compatible linked data (study metadata, contrasts, genes, and pathway enrichment) so users can run queries that span NDE dataset metadata, GXA expression results, and other knowledge graphs in the federation, combining and interpreting those results in one workflow.

FRINK registry and how things connect

Knowledge graphs are listed in the FRINK registry, each with a short name, title, description, and a link to its query endpoint. The FRINK SPARQL federation exposes those graphs so they can be queried, individually or in combination, depending on setup. The NDE graph is published in that ecosystem; the GXA graph is published similarly. At a high level, WOBD links NDE metadata (what datasets exist, how they are annotated, and where to get them) to the same query plane as curated biological knowledge (genes, diseases, drugs, pathways, expression contrasts) already present in FRINK, so templated SPARQL queries can span dataset discovery and mechanistic context without siloed portals.

This application

This site offers two ways to access WOBD content. The templated queries page is a template-based front end to the federated layer: each workflow is a predefined, validated SPARQL query pattern you can run with your own search terms or parameters. You choose a template, the app fills in the corresponding SPARQL and executes it against the registered graph endpoints so you can explore the NDE dataset layer and related biological knowledge in FRINK without writing queries by hand. The unified MCP server exposes the same graphs (and many more) to AI assistants for open-ended, natural-language exploration.

Unified MCP server

The mcp-proto-okn project provides a single Model Context Protocol server that consolidates 27 Proto-OKN knowledge graphs — including the NDE and GXA graphs used by WOBD — into one interface. Through it, an AI assistant can discover relevant graphs, inspect their schemas, run SPARQL queries with automatic ontology expansion, bridge identifiers across graphs (genes, chemicals, diseases, locations, industry codes), and synthesize results from multiple sources in a single conversation. See the unified server documentation for the full tool list and design.

Connect to the public endpoint

A hosted instance is available at https://frink.apps.renci.org/mcp/proto-okn/mcp. Point your MCP-capable client at that URL to query the graphs without running anything locally.

Run it locally with Claude Desktop

Add the following to your claude_desktop_config.json and restart Claude Desktop:

{
  "mcpServers": {
    "proto-okn": {
      "command": "/full/path/to/uv",
      "args": ["--directory", "/path/to/mcp-proto-okn", "run", "mcp-proto-okn-unified"]
    }
  }
}

The server exposes tools in four groups: discovery (list graphs, route a question to likely graphs, get descriptions), schema and query (inspect schemas, run SPARQL, run federated multi-graph queries, pull reusable query templates), cross-graph (identifier bridging and ontology descendant expansion), and visualization (schema diagrams and chat transcripts).

Team

Trish Whetzel
Ben Good
Andrew Su
Chris Bizon
Ginger Tsueng
Jim Balhoff
Yaphet Kebede