• Home
  • 2024 Annual Report
  • 2023 Annual Report
  • 2022 Annual Report
  • 2021 Annual Report
  • 2020 Annual Report
  • 2019 Annual Report
  • 2018 Annual Report
  • 2017 Annual Report
  • 2016 Annual Report
DoM Annual Reports

A Database of a Million Veterans

by emli1120 | Feb 26, 2024 | 2019, caring for our community 2019

Baldeep Singh, MD, with staff at Samaritan House

LAWRENCE LEUNG, MD (left), and PHILIP TSAO, PHD

A Database of a Million Veterans

LAWRENCE LEUNG, MD (left), and PHILIP TSAO, PHD

A Database of a Million Veterans

The goal is simple but ambitious: collect samples and medical data from a million American veterans to create an enormous database of medical information. For Philip Tsao, PhD, research professor of cardiovascular medicine, and Lawrence Leung, MD, Maureen Lyles D’Ambrogio Professor of Medicine and senior associate dean for Veteran Affairs, the Million Veterans Program, or MVP, is a way to enhance both veterans’ health and the medical field in general.

Tsao and Leung have been collaborators for years, and when Leung started work as chief of staff at the Palo Alto VA, he invited Tsao to join him. Leung believed that the VA — a nationally integrated hospital system with records that went back decades, and in fact the first adopter of what is now the EHR or electronic health record—was an ideal place for genomics research.

So seven years ago Tsao moved his lab to the VA, and they began their work.

Both doctors thought the Palo Alto VA in particular was an excellent site for genomics research, with roughly 1,000,000 outpatient encounters per year and a close relationship with Stanford, where they’d be able to, as Tsao says, “leverage the local talent” in various departments, including medicine, genetics, and statistics. Tsao explains that the VA would provide “an opportunity to really quickly collect a large cohort.”

The Beginnings of MVP
Leung and Tsao’s interest in genomics research led them to Washington, DC, where they hoped to pitch their project plans to national VA leaders. Ironically, that was when they found out about the MVP program, an effort much like theirs that was already in motion. That was 2011, and now over 50 VA sites across the country are recruiting individuals for MVP. The program has passed 700,000 participants, “well on the way to a million,” Tsao says. Now they’re thinking of surpassing a million.

MVP participants donate at least four sources of data: a blood sample, access to their electronic health record, a baseline lifestyle survey with demographic information, and a more extensive lifestyle survey with detailed dietary information as well as other medical statistics. All participants can opt out of any part of the voluntary program, but many do everything, including the longer lifestyle survey.

They’re also asked to consent to be re-contacted once their data has been processed. For Tsao this is a crucial part of the project both for the veterans and the larger medical world: Their data can be revisited, their health resampled to “see how their biological signals are changing over time.” And if researchers discover a correlation between, for example, genomic material and a particular disease, they can go back to individuals and study them in more detail, in what Tsao describes as “types of fine mapping studies” that will be crucial as the program goes forward.

The goal is simple but ambitious: collect samples and medical data from a million American veterans to create an enormous database of medical information. For Philip Tsao, PhD, research professor of cardiovascular medicine, and Lawrence Leung, MD, Maureen Lyles D’Ambrogio Professor of Medicine and senior associate dean for Veteran Affairs, the Million Veterans Program, or MVP, is a way to enhance both veterans’ health and the medical field in general.

Tsao and Leung have been collaborators for years, and when Leung started work as chief of staff at the Palo Alto VA, he invited Tsao to join him. Leung believed that the VA — a nationally integrated hospital system with records that went back decades, and in fact the first adopter of what is now the EHR or electronic health record—was an ideal place for genomics research.

So seven years ago Tsao moved his lab to the VA, and they began their work. Both doctors thought the Palo Alto VA in particular was an excellent site for genomics research, with roughly 1,000,000 outpatient encounters per year and a close relationship with Stanford, where they’d be able to, as Tsao says, “leverage the local talent” in various departments, including medicine, genetics, and statistics. Tsao explains that the VA would provide “an opportunity to really quickly collect a large cohort.”

The Beginnings of MVP
Leung and Tsao’s interest in genomics research led them to Washington, DC, where they hoped to pitch their project plans to national VA leaders. Ironically, that was when they found out about the MVP program, an effort much like theirs that was already in motion. That was 2011, and now over 50 VA sites across the country are recruiting individuals for MVP. The program has passed 700,000 participants, “well on the way to a million,” Tsao says. Now they’re thinking of surpassing a million.

MVP participants donate at least four sources of data: a blood sample, access to their electronic health record, a baseline lifestyle survey with demographic information, and a more extensive lifestyle survey with detailed dietary information as well as other medical statistics. All participants can opt out of any part of the voluntary program, but many do everything, including the longer lifestyle survey.

They’re also asked to consent to be re-contacted once their data has been processed. For Tsao this is a crucial part of the project both for the veterans and the larger medical world: Their data can be revisited, their health resampled to “see how their biological signals are changing over time.” And if researchers discover a correlation between, for example, genomic material and a particular disease, they can go back to individuals and study them in more detail, in what Tsao describes as “types of fine mapping studies” that will be crucial as the program goes forward.

Veterans proved to be ideal genomic study subjects for another reason: their patriotism. “They’re very much interested in continuing to serve their country,” Tsao says, adding that he’s heard dozens of participants say that participating in MVP is “one way they can contribute, not only to their brothers in arms but also to their country. The research effort may not help them individually but it will help not only their brothers but also generations to come. Veterans are very interested in research that will pay forward.”

Translating Data into Results
So far, so good. But the next step is both daunting and slightly ambiguous: What will they do with all the information they’ve collected? Seven years in, the data is being organized, and qualified researchers are beginning to access it. As Tsao states, “Some of our first papers are just coming out, and we’re very excited about not only what has been done up to this point, but the potential of the study itself.” For example, the team at Stanford/Palo Alto VA has recently published a study in Nature Genetics that greatly expands the number of genetic factors that contribute to lipid levels. (High levels of these blood fats are a major risk factor for heart disease.)

The possibilities raised by this type of data are exciting. “We know that certain risk factors such as blood pressure and your cholesterol level are important for heart attacks, and we now can go back decades and get people’s cholesterol levels over time. We can look at their maximum cholesterol level, we can look at the trajectory, and we can look at what the interaction with different drugs may have been.”

The Palo Alto VA has also launched its own center: the VA Palo Alto Epidemiology Research and Information Center, or ERIC, to facilitate the analysis of MVP-gathered data. The center will take advantage of the proximity to Stanford and involve contributions from many Stanford-based programs in harvesting MVP’s data for research. Collaborators include Tim Assimes, MD, PhD, associate professor of cardiology and epidemiology; Michael Snyder, PhD, professor and chair of genetics at the Stanford University School of Medicine; Wing Wong, PhD, Stephen R. Pierce Family Goldman Sachs Professor in Science and Human Health and professor of biomedical data science; and Hua Tang, PhD, professor of genetics and statistics.

“There’s a diverse and deep amount of talent at Stanford,” Tsao says. This type of collaboration leads to novel methods to approach biology. Tsao, Leung, Snyder, and colleagues recently published a paper describing a new technique that harnesses the power of machine learning applied to genetic data and health records.

Tsao and Leung are currently co-directors of the Palo Alto MVP program, as well as co-directors of ERIC with Assimes. Tsao is one of the three principal investigators for the nationwide MVP program, and he’s the principal investigator of one of the first approved studies to examine the MVP data: a study on cardiometabolic disease with Assimes and Jennifer Lee, MD, PhD, associate professor of endocrinology and epidemiology. Lee and Assimes are also involved in a study to incorporate some of the work of Nigam Shah, MBBS, PhD, associate professor of biomedical informatics, into the VA electronic health record to improve the phenotyping of individuals, which they will then apply to their genomics work. 

The Future of MVP
The far-reaching goals of MVP can overwhelm, Tsao says. “One of the fears would be that we make a lot of discoveries and then we inundate both patient and provider to a point where it becomes more harm than good,” he explains. “Beyond the science there’s a whole host of work that needs to be done to integrate this into health care.”

But he’s optimistic about its overarching hopes. “The ultimate goal would be to discover diagnostics, prognostics, and theranostics that could be eventually brought into the clinic. And of course understanding the basic underpinnings of disease and how we can apply those to identify individuals who are at risk, and then help in the management of both disease and health.”

Both Leung and Tsao clearly believe in the enormous potential of this study. “MVP is the crown jewel of VA research,” Leung says. “Palo Alto VA, in close partnership with the Stanford School of Medicine, will continue to play a leading role in the translation of this program in defining precision medicine.”

Veterans proved to be ideal genomic study subjects for another reason: their patriotism. “They’re very much interested in continuing to serve their country,” Tsao says, adding that he’s heard dozens of participants say that participating in MVP is “one way they can contribute, not only to their brothers in arms but also to their country. The research effort may not help them individually but it will help not only their brothers but also generations to come. Veterans are very interested in research that will pay forward.”

Translating Data into Results
So far, so good. But the next step is both daunting and slightly ambiguous: What will they do with all the information they’ve collected? Seven years in, the data is being organized, and qualified researchers are beginning to access it. As Tsao states, “Some of our first papers are just coming out, and we’re very excited about not only what has been done up to this point, but the potential of the study itself.” For example, the team at Stanford/Palo Alto VA has recently published a study in Nature Genetics that greatly expands the number of genetic factors that contribute to lipid levels. (High levels of these blood fats are a major risk factor for heart disease.)

The possibilities raised by this type of data are exciting. “We know that certain risk factors such as blood pressure and your cholesterol level are important for heart attacks, and we now can go back decades and get people’s cholesterol levels over time. We can look at their maximum cholesterol level, we can look at the trajectory, and we can look at what the interaction with different drugs may have been.”

The Palo Alto VA has also launched its own center: the VA Palo Alto Epidemiology Research and Information Center, or ERIC, to facilitate the analysis of MVP-gathered data. The center will take advantage of the proximity to Stanford and involve contributions from many Stanford-based programs in harvesting MVP’s data for research. Collaborators include Tim Assimes, MD, PhD, associate professor of cardiology and epidemiology; Michael Snyder, PhD, professor and chair of genetics at the Stanford University School of Medicine; Wing Wong, PhD, Stephen R. Pierce Family Goldman Sachs Professor in Science and Human Health and professor of biomedical data science; and Hua Tang, PhD, professor of genetics and statistics.

“There’s a diverse and deep amount of talent at Stanford,” Tsao says. This type of collaboration leads to novel methods to approach biology. Tsao, Leung, Snyder, and colleagues recently published a paper describing a new technique that harnesses the power of machine learning applied to genetic data and health records.

Tsao and Leung are currently co-directors of the Palo Alto MVP program, as well as co-directors of ERIC with Assimes. Tsao is one of the three principal investigators for the nationwide MVP program, and he’s the principal investigator of one of the first approved studies to examine the MVP data: a study on cardiometabolic disease with Assimes and Jennifer Lee, MD, PhD, associate professor of endocrinology and epidemiology. Lee and Assimes are also involved in a study to incorporate some of the work of Nigam Shah, MBBS, PhD, associate professor of biomedical informatics, into the VA electronic health record to improve the phenotyping of individuals, which they will then apply to their genomics work.

The Future of MVP
The far-reaching goals of MVP can overwhelm, Tsao says. “One of the fears would be that we make a lot of discoveries and then we inundate both patient and provider to a point where it becomes more harm than good,” he explains. “Beyond the science there’s a whole host of work that needs to be done to integrate this into health care.”

But he’s optimistic about its overarching hopes. “The ultimate goal would be to discover diagnostics, prognostics, and theranostics that could be eventually brought into the clinic. And of course understanding the basic underpinnings of disease and how we can apply those to identify individuals who are at risk, and then help in the management of both disease and health.”

Both Leung and Tsao clearly believe in the enormous potential of this study. “MVP is the crown jewel of VA research,” Leung says. “Palo Alto VA, in close partnership with the Stanford School of Medicine, will continue to play a leading role in the translation of this program in defining precision medicine.”

Can AI Really Improve Care?

by emli1120 | Feb 26, 2024 | 2019, caring for our community 2019

Baldeep Singh, MD, with staff at Samaritan House

ARNOLD MILSTEIN, MD (right), collaborates with FEI-FEI LI, PHD, director of the Artificial Intelligence Lab

Can AI Really Improve Care?

ARNOLD MILSTEIN, MD (right), collaborates with FEI-FEI LI, PHD, director of the Artificial Intelligence Lab

Can AI Really Improve Care?

Arnold Milstein, MD, came to Stanford eight years ago with a simple assignment: Find out how to lower the national cost of producing great health care. Put another way, if we could find more affordable ways to deliver better care for conditions that consume the bulk of the country’s health care spending, more monies would be available for other ways to improve human well-being — like education and social services.

Milstein was ideally suited to the task. He spent two decades working to improve health care value in the private sector, after which he served as an advisor to Congress and the White House. In 2011 he created Stanford’s Clinical Excellence Research Center (CERC). It is the first university-based research center exclusively dedicated to discovering, testing, and disseminating cost-saving innovations in clinically excellent care.

One of CERC’s areas of emphasis is discovering how artificial intelligence (AI) can prevent inadvertent and costly failures in intended care delivery. This focus began with a call from Professor Fei-Fei Li, PhD, director of the Artificial Intelligence Lab in the Stanford School of Engineering.

“Our subsequent conversations sparked a decision to create a unique cross-school Partnership in AI-assisted Healthcare, which we call PAC. We imagined a world in which AI improves the performance of a broad range of human services that affect health,” Milstein says.

“We initially focused solely on health care in order to learn and make a difference before we expand our use of AI to improve performance across a broad range of health-affecting services,” adds Milstein, who turns to a favorite initial target: lowering the incidence of hospital-acquired conditions or HACs.

“Every time a patient in a U.S. hospital acquires an infection that they didn’t come in with, human misery and tens of thousands of dollars to the cost of a hospitalization follow,” he explains.

“No clinician wants to impose hospital-acquired infections on their patients. But clinicians are busy. They’re human. They’re imperfect. So they don’t always notice when they’ve just skipped a critical intended action step.”

That led to thinking about how artificial intelligence could be used to help detect and correct — in real time — deviations in essential clinical actions, like maintaining hand hygiene, which is a primary way to prevent hospital-acquired infections.

In 2015 CERC researchers, alongside graduate students and faculty in the AI Lab, began developing a system that detects whether someone used the alcohol hand dispenser that sits on the wall next to every hospital room entrance.

Arnold Milstein, MD, came to Stanford eight years ago with a simple assignment: Find out how to lower the national cost of producing great health care. Put another way, if we could find more affordable ways to deliver better care for conditions that consume the bulk of the country’s health care spending, more monies would be available for other ways to improve human well-being — like education and social services.

Milstein was ideally suited to the task. He spent two decades working to improve health care value in the private sector, after which he served as an advisor to Congress and the White House. In 2011 he created Stanford’s Clinical Excellence Research Center (CERC). It is the first university-based research center exclusively dedicated to discovering, testing, and disseminating cost-saving innovations in clinically excellent care.

One of CERC’s areas of emphasis is discovering how artificial intelligence (AI) can prevent inadvertent and costly failures in intended care delivery. This focus began with a call from Professor Fei-Fei Li, PhD, director of the Artificial Intelligence Lab in the Stanford School of Engineering.

“Our subsequent conversations sparked a decision to create a unique cross-school Partnership in AI-assisted Healthcare, which we call PAC. We imagined a world in which AI improves the performance of a broad range of human services that affect health,” Milstein says.

“We initially focused solely on health care in order to learn and make a difference before we expand our use of AI to improve performance across a broad range of health-affecting services,” adds Milstein, who turns to a favorite initial target: lowering the incidence of hospital-acquired conditions or HACs.

“Every time a patient in a U.S. hospital acquires an infection that they didn’t come in with, human misery and tens of thousands of dollars to the cost of a hospitalization follow,” he explains.

“No clinician wants to impose hospital-acquired infections on their patients. But clinicians are busy. They’re human. They’re imperfect. So they don’t always notice when they’ve just skipped a critical intended action step.”

That led to thinking about how artificial intelligence could be used to help detect and correct — in real time — deviations in essential clinical actions, like maintaining hand hygiene, which is a primary way to prevent hospital-acquired infections.

In 2015 CERC researchers, alongside graduate students and faculty in the AI Lab, began developing a system that detects whether someone used the alcohol hand dispenser that sits on the wall next to every hospital room entrance. Their system relies on computer vision, a rapidly progressing domain of artificial intelligence used in the automotive and other industries.

“If computer vision can detect when drivers initiate dangerous lane changes and safely control vehicular steering, can it similarly analyze motion to detect unintended deviations in important clinician behaviors or patient activities?” asked Milstein and Li’s research team in a New England Journal of Medicine article.

AI systems that take advantage of computer vision are relatively inexpensive. By using them, the team has shown it can achieve greater than 95 percent accuracy in detecting inadvertent omissions in the use of the hand sanitizer before staff enter patient rooms.

The vision of making excellent care more effective and efficient also targets behaviors that affect lifelong health trajectories. In collaboration with Stanford researchers in child development and pediatrics, the team is testing how computer vision can let mothers know if their eyes inadvertently drift to their smartphone screen instead of responsively returning their infant’s gaze.

The hope, Milstein says, is to unite technology and human care. “By mobilizing emerging science and technology from engineering, behavioral sciences, and medicine, Stanford can address a seemingly intractable national challenge to make affordable all forms of human caring that powerfully affect health.”

Their system relies on computer vision, a rapidly progressing domain of artificial intelligence used in the automotive and other industries.

“If computer vision can detect when drivers initiate dangerous lane changes and safely control vehicular steering, can it similarly analyze motion to detect unintended deviations in important clinician behaviors or patient activities?” asked Milstein and Li’s research team in a New England Journal of Medicine article.

AI systems that take advantage of computer vision are relatively inexpensive. By using them, the team has shown it can achieve greater than 95 percent accuracy in detecting inadvertent omissions in the use of the hand sanitizer before staff enter patient rooms.

The vision of making excellent care more effective and efficient also targets behaviors that affect lifelong health trajectories. In collaboration with Stanford researchers in child development and pediatrics, the team is testing how computer vision can let mothers know if their eyes inadvertently drift to their smartphone screen instead of responsively returning their infant’s gaze.

The hope, Milstein says, is to unite technology and human care. “By mobilizing emerging science and technology from engineering, behavioral sciences, and medicine, Stanford can address a seemingly intractable national challenge to make affordable all forms of human caring that powerfully affect health.”

Humans and AI, Not Humans versus AI

by emli1120 | Feb 26, 2024 | 2019, caring for our community 2019

Baldeep Singh, MD, with staff at Samaritan House

SONOO THADANEY, MBA (left) and ABRAHAM VERGHESE, MD

Humans and AI, Not Humans versus AI

SONOO THADANEY, MBA (left) and ABRAHAM VERGHESE, MD

Humans and AI, Not Humans versus AI

“I hold out hope that artificial intelligence and machine-learning algorithms will transform our experience, particularly if natural-language processing and video technology allow us to capture what is actually said and done in the exam room,” writes Abraham Verghese, MD, professor of medicine and founding faculty director of the Stanford Presence Center.

“The physician focuses on the patient and family, and if there is a screen in the room, it is to summarize or to share images with the patient; by the end of the visit, the progress notes and billing are done.

But AI applications will help us only if we vet all of them for their unintended consequences. Technology that is not subject to such scrutiny doesn’t deserve our trust, nor should we ever allow it to be deeply integrated into our work,” Verghese continues in a May 2018 article that appeared in The New York Times Magazine.

That sentiment is behind a key focus for Presence, a center that emphasizes the value of the human connection in the high-wire balancing act between high tech and high touch.

Presence aims to ensure that patients, clinicians, funders, legislators, and other stakeholders are at the table as equitable and inclusive AI solutions are created and deployed in health care.

To that end, Presence presented two symposia during 2018. In April, Jonathan Chen, MD, assistant professor of biomedical informatics, was a leader of the first symposium, “Human Intelligence and Artificial Intelligence in Medicine,” which addressed augmented intelligence of humans and machines for diagnostics.

“I hold out hope that artificial intelligence and machine-learning algorithms will transform our experience, particularly if natural-language processing and video technology allow us to capture what is actually said and done in the exam room,” writes Abraham Verghese, MD, professor of medicine and founding faculty director of the Stanford Presence Center.

“The physician focuses on the patient and family, and if there is a screen in the room, it is to summarize or to share images with the patient; by the end of the visit, the progress notes and billing are done. But AI applications will help us only if we vet all of them for their unintended consequences. Technology that is not subject to such scrutiny doesn’t deserve our trust, nor should we ever allow it to be deeply integrated into our work,” Verghese continues in a May 2018 article that appeared in The New York Times Magazine.

That sentiment is behind a key focus for Presence, a center that emphasizes the value of the human connection in the high-wire balancing act between high tech and high touch.

Presence aims to ensure that patients, clinicians, funders, legislators, and other stakeholders are at the table as equitable and inclusive AI solutions are created and deployed in health care.

To that end, Presence presented two symposia during 2018. In April, Jonathan Chen, MD, assistant professor of biomedical informatics, was a leader of the first symposium, “Human Intelligence and Artificial Intelligence in Medicine,” which addressed augmented intelligence of humans and machines for diagnostics. The 350 physicians, business leaders, policymakers, social and behavioral scientists, venture capitalists, and political activists in attendance were challenged to determine how to ensure that humans are augmented by AI in defining and delivering compassionate services.

On that subject Verghese says, “Pitting humans against machines is not the point. Rather, how best to relevantly engage both for the sum to be greater than the parts should be the focus.”

“Machines do many things very well, but they really can’t do the caring work, so how do we augment the two preemptively, proactively, and equitably for the outcome that we all seek?” he asks.

Pitting humans against machines is not the point. Rather, how best to relevantly engage both for the sum to be greater than the parts should be the focus

“Artificial Intelligence in Medicine: Inclusion and Equity” was the second symposium in August, which drew 275 attendees from around the world. Presence executive director Sonoo Thadaney, MBA, co-chair of the National Academy of Medicine’s Working Group on AI in Healthcare, was one of the symposium leaders. Acknowledging the potential unintended consequences of AI in medicine, she examined how to prevent and manage the possible exacerbation of inequity and exclusion in health care.

Thadaney speaks of a huge inequity that looms depending on an individual’s circumstances, saying: “We cannot have a world where technology creates greater inequity such that those of us with privilege have access to second opinions and concierge physicians, and the rest of the planet ends up with medicine that is meted out with the efficiency and emptiness of fast food. We cannot afford a health care apartheid.”

The Gordon and Betty Moore Foundation and the Robert Wood Johnson Foundation support Presence by funding the symposia as well as another innovative program that began at the end of 2018: the AI in Medicine Inclusion & Equity (AiMIE) 2018 Seed Grants Program. The AiMIE program provides initial funding for projects seeking equitable and inclusive frameworks for AI in medicine.

The 350 physicians, business leaders, policymakers, social and behavioral scientists, venture capitalists, and political activists in attendance were challenged to determine how to ensure that humans are augmented by AI in defining and delivering compassionate services.

On that subject Verghese says, “Pitting humans against machines is not the point. Rather, how best to relevantly engage both for the sum to be greater than the parts should be the focus.”

“Machines do many things very well, but they really can’t do the caring work, so how do we augment the two preemptively, proactively, and equitably for the outcome that we all seek?” he asks.

Pitting humans against machines is not the point. Rather, how best to relevantly engage both for the sum to be greater than the parts should be the focus

“Artificial Intelligence in Medicine: Inclusion and Equity” was the second symposium in August, which drew 275 attendees from around the world. Presence executive director Sonoo Thadaney, MBA, co-chair of the National Academy of Medicine’s Working Group on AI in Healthcare, was one of the symposium leaders. Acknowledging the potential unintended consequences of AI in medicine, she examined how to prevent and manage the possible exacerbation of inequity and exclusion in health care.

Thadaney speaks of a huge inequity that looms depending on an individual’s circumstances, saying: “We cannot have a world where technology creates greater inequity such that those of us with privilege have access to second opinions and concierge physicians, and the rest of the planet ends up with medicine that is meted out with the efficiency and emptiness of fast food. We cannot afford a health care apartheid.”

The Gordon and Betty Moore Foundation and the Robert Wood Johnson Foundation support Presence by funding the symposia as well as another innovative program that began at the end of 2018: the AI in Medicine Inclusion & Equity (AiMIE) 2018 Seed Grants Program. The AiMIE program provides initial funding for projects seeking equitable and inclusive frameworks for AI in medicine.

FAIR Compliant Biomedical Metadata Templates | CEDAR

by emli1120 | Feb 26, 2024 | 2019, caring for our community 2019

Baldeep Singh, MD, with staff at Samaritan House

MARK MUSEN, MD, PHD

FAIR Compliant Biomedical Metadata Templates | CEDAR

MARK MUSEN, MD, PHD

FAIR Compliant Biomedical Metadata Templates | CEDAR

Making Large Data Easily Available Online
Several years ago, Mark Musen, MD, PhD, wrote: “The ultimate Big Data challenge lies not in the data, but in the metadata — the machine-readable descriptions that provide data about the data. It is not enough to simply put data online; data are not usable until they can be ‘explained’ in a manner that both humans and computers can process.”

Musen is a professor of biomedical informatics and director of the Stanford Center for Biomedical Informatics Research. He is also the head of CEDAR, the Center for Expanded Annotation and Retrieval, which helps researchers comply with requirements to archive their data so others can understand and use them. In a recent interview, Musen provided clarity about the problem of metadata.

Why is it a problem for researchers to comply with the requirement to publish their metadata?
The greatest challenge of this whole enterprise is the problem of “What’s in it for me?” We reward scientists for authoring journal articles and for creating PDFs, but we don’t have a system that recognizes the data contributions that scientists make. We need to change the culture so that when other investigators report secondary analyses of data, or when data sets are re-explored and then lead to new discoveries, there is a benefit to the original investigator other than being acknowledged in someone else’s paper. Currently, investigators don’t have the motivation to spend a lot of time making their experimental data easily available online, and they generally lack tools to enable them to do so in a standardized, reproducible fashion.

Are there problems with data currently in repositories?
We’re starting to see an emphasis not just on putting the data into repositories but on actually doing a good job of it. The National Center for Biotechnology Information (NCBI) maintains most of the NIH repositories for experimental data, but it generally does no more than make sure that the forms are filled in. So NCBI databases contain lots of horrible stuff; for instance, some 25 percent of the metadata values that are supposed to be numeric don’t actually parse as numbers.

Making Large Data Easily Available Online
Several years ago, Mark Musen, MD, PhD, wrote: “The ultimate Big Data challenge lies not in the data, but in the metadata — the machine-readable descriptions that provide data about the data. It is not enough to simply put data online; data are not usable until they can be ‘explained’ in a manner that both humans and computers can process.”

Musen is a professor of biomedical informatics and director of the Stanford Center for Biomedical Informatics Research. He is also the head of CEDAR, the Center for Expanded Annotation and Retrieval, which helps researchers comply with requirements to archive their data so others can understand and use them. In a recent interview, Musen provided clarity about the problem of metadata.

Why is it a problem for researchers to comply with the requirement to publish their metadata?
The greatest challenge of this whole enterprise is the problem of “What’s in it for me?” We reward scientists for authoring journal articles and for creating PDFs, but we don’t have a system that recognizes the data contributions that scientists make. We need to change the culture so that when other investigators report secondary analyses of data, or when data sets are re-explored and then lead to new discoveries, there is a benefit to the original investigator other than being acknowledged in someone else’s paper. Currently, investigators don’t have the motivation to spend a lot of time making their experimental data easily available online, and they generally lack tools to enable them to do so in a standardized, reproducible fashion.

Are there problems with data currently in repositories?
We’re starting to see an emphasis not just on putting the data into repositories but on actually doing a good job of it. The National Center for Biotechnology Information (NCBI) maintains most of the NIH repositories for experimental data, but it generally does no more than make sure that the forms are filled in. So NCBI databases contain lots of horrible stuff; for instance, some 25 percent of the metadata values that are supposed to be numeric don’t actually parse as numbers.

Is this where CEDAR has a role to play?
Precisely. The idea of CEDAR is to make it easier and more attractive for investigators to publish their data because more science is going to come out of it if they do. CEDAR has a whole library of templates that correspond to “minimal information models” for describing different classes of experiments. And we have technology that makes it easy to fill in one of these templates to describe your particular experiment when you are ready to upload your data sets to a repository. By filling in the template, you create standardized, searchable metadata that future investigators will use to locate the data and to make sense of what you have done. Using a cache of metadata that it already has stored, CEDAR can make suggestions as you’re filling out a template to accelerate the process of creating the metadata in the first place.

How is CEDAR being used today?
CEDAR helps investigators put data sets online — with well-described metadata — that will allow future scientists to perform new analyses that may allow them to make new discoveries. Our collaborations with several large research consortia show that it’s not all that difficult for investigators to do a great job of annotating their data sets in a way that will benefit the entire scientific community. Immunologists in the Antibody Society use CEDAR to upload their data and metadata to repositories at the NIH. Scientists developing the Library of Integrated Network-Based Cellular Signatures use CEDAR in association with their own data coordinating and integration center. The Irish Health Research Board and the Dutch Clinical Funding Agency are evaluating using CEDAR to review proposed metadata before making funding decisions about new studies.

Is this where CEDAR has a role to play?
Precisely. The idea of CEDAR is to make it easier and more attractive for investigators to publish their data because more science is going to come out of it if they do. CEDAR has a whole library of templates that correspond to “minimal information models” for describing different classes of experiments. And we have technology that makes it easy to fill in one of these templates to describe your particular experiment when you are ready to upload your data sets to a repository. By filling in the template, you create standardized, searchable metadata that future investigators will use to locate the data and to make sense of what you have done. Using a cache of metadata that it already has stored, CEDAR can make suggestions as you’re filling out a template to accelerate the process of creating the metadata in the first place.

How is CEDAR being used today?
CEDAR helps investigators put data sets online — with well-described metadata — that will allow future scientists to perform new analyses that may allow them to make new discoveries. Our collaborations with several large research consortia show that it’s not all that difficult for investigators to do a great job of annotating their data sets in a way that will benefit the entire scientific community. Immunologists in the Antibody Society use CEDAR to upload their data and metadata to repositories at the NIH. Scientists developing the Library of Integrated Network-Based Cellular Signatures use CEDAR in association with their own data coordinating and integration center. The Irish Health Research Board and the Dutch Clinical Funding Agency are evaluating using CEDAR to review proposed metadata before making funding decisions about new studies.

Next Entries »

Recent Posts

  • SCCR: The Engine of Innovation
  • The Smoking Cessation Program for Veterans That Doesn’t Quit
  • The Stanford Advantage
  • Limited Sobriety Pathway Saves the Lives of Those With Little Hope
  • Shots on Goal

Recent Comments

No comments to show.

Department of Medicine

About Us Research
Faculty Patient Care
Residency News
Education Events

 

Stanford School of Medicine

About Basic Science Departments
Contact Clinical Science Departments
Maps & Directions Academic Programs
Careers Vision
Find People
Visit Stanford
Search Clinical Trials
Give a Gift
©2024 Stanford Medicine

    Privacy Policy • Terms of Use • Accessibility

    • Follow
    • Follow