Who We Are
Services
Projects
Resources
Home
 
   

project list | caBIG(TM) Workspaces

caBIG(TM) Projects

Tissue Banks and Pathology Tools Workspace

The University of Pennsylvania is an active participant in the caBIG Tissue Banks and Pathology Tools Workspace. BMIF members David Fenstermacher, Tara McSherry, Casey Overby, Nate DiGiorgio, Vishal Nayak, Kevin Lux, David Wang, John Quigley, and David Birtwell are all contributing to the projects in this workspace, as well as TTAB members Michael Feldman, Lisa Miranda, and Laura Smicker.

The overall goal of the TBPT Workspace is to facilitate information sharing. The efforts of this workspace are targeted at the development of a more effective method for researchers to locate and analyze tissue for use in cancer trials. In order to achieve this goal the TBPT Workspace is developing an extensive suite of tools called the Tissue Banking and Pathology Tools Suite. The outcome of these tools will be the ability for a researcher to access a GRID of participating cancer centers and query the "Virtual Tissue Repository".

Cancer Information Extraction System (University of Pittsburgh)

The Cancer Text Information Extraction System (caTIES) Project is one of the components of the TBPT Suite and concentrates on information extraction in order to assist researchers in gaining access to tissue. caTIES focuses on extracting information from free text Surgical Pathology Reports (SPR). caTIES will allow researchers the opportunity to query and browse the information extracted from the SPRs. Furthermore, once the researcher has acquired information that is pertinent to their study from the SPRs, caTIES will provide a means for the researcher to request the tissue from the caBIG community.

PHASE I
Phase I of caTIES has been completed. In this phase a subset of Surgical Pathology Reports were run through the caTIES pipeline. This de-identified data was stored in a MySQL database which identified and authenticated users were able to query. In this phase, the caTIES system supported queries for case sets through text and concept searches, with age and sex constraints. Boolean type searching (AND, OR and NOT) as well as searches using the negation engine were available to make searches as effective as possible.

Part of the caTIES pipeline is the DE-ID software. This software scrubs the SPRs that are run through caTIES of PHI. What this means is that the data sets returned to end users do not contain PHI so that the end users obtain de-identified data sets from which to work.

PHASE II
We are currently in Phase II of the caTIES project. The objective of Phase II is to enhance the functionality of the caTIES beta release which was completed in Phase I of development.

caTIES was chosen as one of the four initial caBIG reference implementations for the caGRID technology. For Phase II, the software was installed and tested at three Adopter institutions. The archives at Penn and UPMC, and the 2 new adopter sites (Washington University and Thomas Jefferson University) will be placed onto the NCIÕs caBIG Informatics Grid for querying across institutions. Because all of the data viewable by the end user is de-identified, HIPPA regulations are still enforced and patient confidentiality is not violated. Phase II of the caTIES project created a cross institutional search tool to allow investigators to locate tissues across institutions for study. To obtain tissues, these investigators would then have to find collaborators at each institution and submit the appropriate IRB and Material Transfer documentation for use of these tissues. Eventually, in later phases, we aspire to have GRID communication between the entire caBIG community.

Paraffin-Embedded Tissue Archive (University of Pennsylvania)

The Paraffin Embedded Tissue Project is an extension of caTIES. The goal of this project is to obtain pathology data from up to 500,000 University of Pennsylvania paraffin-block archived tissues and upload the data into the caTIES database. Each block of tissue has a corresponding microfiche file that contains the information about the tissue and other pathological information. The project used microfiche that was scanned to create TIFF images of all corresponding pathology reports, extract accession/subject IDs and full text using OCR (Optical Character Recognition). Through the caTIES query interface these tissue samples will be exposed to the caBIG community.

The Paraffin Embedded Tissue Project is of huge significance because currently there are vast quantities of paraffin embedded tissue in archives within Surgical Pathology Departments throughout the country. The majority of these archives have little or no associated annotation information that could be used to provide access to the tissue. The success of the Paraffin Embedded Tissue Project at Penn would open the door to accessibility of paraffin archives from all participating cancer centers within the entire caBIG community.

To date, microfiche from 1948-1988 of PennÕs Paraffin related Surgical Pathology Reports (SPRs) have been scanned and accession numbers manually keyed. An extensive quality assurance (QA) process is now underway. The first half of the QA, an analysis of the quality of the Optical Character Recognition (OCR) of these files is complete. More than 2,500 SPRs were examined; more than 85% of these files were deemed satisfactory for caTIES processing. The second half of the QA, determining how well caTIES can code these OCR SPRs is progressing. A test set has been run through the Phase I version of caTIES. The same test set will be re-coded with the caTIES Phase 2 final release, for a comparison of coding quality. The database schema will then be populated with all the paraffin files, coded through the caTIES pipeline and a link established between the database record and its corresponding SPR TIFF image for accessibility only with proper IRB authorization.

caTISSUE Core (Washington University)

The caTISSUE Core Project is the first step in the development of the TBPT Suite. caTISSUE Core has been designed to meet the basic requirements of caTISSUE and be fully functional while the more comprehensive TBPT Suite is being developed. caTISSUE Core has been described as Òa core solution for biospecimen inventory, tracking, and basic annotation that may be used by biospecimen resource facilitiesÓ.

caTISSUE Core will create the needed resources at Penn for a virtual tissue bank to facilitate distribution of tissue and tumor samples for basic and clinical research applications. The benefits of creating a virtual tissue bank are as follows: (1) established infrastructures for tumor and tissue collection can be maintained by individual tissue banks with mapping to caTISSUE; (2) new tissue collection cores, including the Department of Pathology and Laboratory Medicine's and the Abramson Cancer Center's Tissue Bank, can be established for tissues and tumors not currently being collected with the caTISSUE infrastructure being the main tool for tissue data collection; (3) all tissue and tumor samples will be tracked and monitored within a single database once procedures are established to systematically transfer data between individual tissue banks and the caTISSUE central repository; (4) the ability to develop customized data storage and end-user interfaces using caTISSUE to make tissue data available to internal and external researchers; and (5) although several laboratories will store tumor and tissue samples outside of the Central Tissue Bank, the result of integrating established tissue banks through a virtual bank, using caTISSUE, will have the appearance of a single coordinated effort within the Abramson Cancer Center, the Department of Pathology and Laboratory Medicine and the School of Medicine.

Currently, the BMIF is actively involved in the development of caTISSUE Core. caTISSUE Core Phase 1 and caTISSUE Core Phase 1b have been completed. In Phase 1, the BMIF participated in the requirements gathering and data modeling activities that produced the alpha version of caTISSUE Core v1.0. In caTISSUE Core Phase 1b, the BMIF participated in testing both the alpha and beta modules and code delivered by the caTISSUE developers, Washington University, as well as identifying additional requirements and enhancements, and writing end user documentation.

Tissue Banking and Pathology Tools Suite (Washington University)

The TBPT Suite is an integrated system combining caTISSUE Core, caTIES, and the Clinical Annotation Engine (CAE). It is currently under development by Washington University. Thus far, the BMIF has been involved in testing of the caTISSUE Core v1.1 alpha, beta, and final releases. Currently, the BMIF is involved in testing software enhancements and User Interface changes that will result in the caTISSUE Core v1.2. The final release of caTISSUE Core v1.2 is expected to be released in late spring and it will become part of the TBPT Suite. Additionally, the BMIF has been highly involved in the requirements gathering for the TBPT Suite. The alpha version of the Suite is expected to be released in late May. Phase II of the project will begin in early summer and will result in the release of the TBPT Suite v1.0. The BMIF has been selected as a Phase II adopter and plans to adopt all three parts of the TBPT Suite v1.0.

Clinical Trial Management Systems Workspace

The Clinical Trial Management Systems Workspace is developing a comprehensive set of modular, interoperable and standards based tools designed to meet the diverse clinical trials management needs of the Cancer Center community. The tools developed will be configurable to meet the needs of Cancer Centers with little or no clinical data management systems in place as well as those with robust systems, and will take into account the diversity of clinical research activities and local practices that exist among these Cancer Centers. The caBIG(TM) core principles of open source, open access, open development and federation of data sources are guiding all new tool and product development. In particular for this Workspace, interoperability and modular development are key, as solutions are likely to consist of a flexible assembly of compatible tools pulled from a rich collection of tools developed by the Workspace, as well as existing commercial and locally developed solutions that have been made caBIG(TM) compatible by the community.

Cancer Central Clinical Database (National Cancer Institute Center for Bioinformatics)

The Abramson Cancer Center and BMIF have adopted NCI’s Cancer Central Clinical Database System (C3DS) Project. C3DS enables the cancer research community to unify systems and data, and to improve work processes to better facilitate clinical trials activation and execution. Developed in a partnership between the NCI Center for Cancer Research (CCR), the NCI Center for Bioinformatics (NCICB), and a commercial software vendor, C3DS provides the cancer research community with the infrastructure to collect and manage clinical trials data. C3DS is comprised of five modules. C3D is one of these modules.

The heart of C3D is a library of standardized templates, electronic Case Report Forms (eCRFs) for collecting specific clinical protocols required data. These templates can be tailored for reuse across multiple studies, greatly accelerating the study implementation cycle. Oracle Clinical supports the C3D with clinical trial definition, data capture, multiple site reporting, data definition and usage standardization. Oracle Clinical’s Remote Data Capture (RDC) provides a user-friendly interface that allows local and remote data entry and electronically confirms source data verification. The Integrated Review tool gives authorized investigators ad-hoc query, reporting, analysis, and web tools for real-time access to clinical data within and across clinical studies. The C3D is currently being used in CCR’s Medical Oncology Research Unit (MOCRU) as well as for a number of SPORES studies.

The main objective of the C3D is to improve clinical trials activation and execution by providing a large-scale and efficient Internet-based clinical trials information management system available for use by multiple cancer research centers across the country.

Data Sharing and Intellectual Capital Workspace

Dr. Howard Bilofsky actively participates in the Data Sharing and Intellectual Capital Workspace. This Workspace addresses issues related to data sharing and intellectual capital associated with caBIG(TM) and develops recommendations to the caBIG(TM) Oversight Board. This process includes the suggestion of standards and drafting of policy documents, as well as writing white papers which assists in clarifying caBIG(TM)s stand on issues surrounding intellectual property. They also provide expert guidance regarding specific areas of concern raised within caBIG(TM) Workspaces and individual Project Teams associated with these issues.

Documentation and Training Workspace

David Wang from the BMIF group participates in the Documentation and Training Workspace by evaluating learning management tools, participating in conference calls and mentoring documentation development. The mission of this Workspace is to facilitate widespread adoption, dissemination, and use of caBIG(TM) interoperable tools, standards, and data sets within the larger cancer and biomedical communities. Our primary goal is to support the creation and dissemination of documentation and training materials for caBIG(TM)-related projects and community-wide resources.

Integrative Cancer Research Workspace

The Integrative Cancer Research Workspace is producing modular and interoperable tools and interfaces that provide for integration between biomedical informatics applications and data. This will ultimately enable translational and integrative research by providing for the integration of clinical and basic research data. The Workspace is developing a software-engineered, well-documented and validated biomedical informatics toolset for use throughout the research community.

The Abramson Cancer Center and BMIF are adopting several tools to serve as prototypes of Gold-Level Compliance for caBIG(TM) .  The tools are:

geWorkbench (Columbia University)

A suite of tools for loading, visualizing and analyzing gene expression data that will provide access to data from any repository with a MAGE-OM API, such as caArray.

geWorkbench (previously caWorkbench) is a Java-based open-source platform for integrated genomics. Using a component architecture, geWorkbench allows individually developed plug-ins to be configured into complex bioinformatics applications.

RProteomics (Duke University)

In this project, termed RProteomics, Duke will build open-source tools and develop standards for proteomics data analysis.  These tools will be made available on the caBIG(TM) grid for research use and as a reference for implementing similar services.  These tools will be based on the R statistical engine.  Included in this project is development of a Statistical Model of Spectra, a proposed statistical component for the MIAPE (Minimal Information About a Proteomics  Experiment) object model.  Also included in this project is the development of JavaR, a reusable component to provide software developers programmatic access in Java to R.  The results of this project will be documented and presented to the rest of the caBIG(TM) community such that the lessons learned can be leveraged by future caBIG(TM) development efforts.

Protein Information Resource (Georgetown University)

The Protein Information Resource (PIR) is an integrated public bioinformatics resource that supports genomic and proteomic research.  PIR maintains the Protein Sequence Database (PSD), an annotated protein database containing over 283,000 sequences covering the entire taxonomic range.  PIR is also a member of the UniProt consortium, the central international resource of protein sequence and function that unifies the PIR, Swiss-Prot, and TrEMBL databases.  For caBIG(TM), the PIR database will be grid-enabled to demonstrate how such a rich data source can be discovered and consumed in a grid environment for the annotation of experiments.

Completed Projects:

RProteomics (Duke University)
Protein Information Resource (Georgetown University)

In Vivo Imaging Workspace

Dr. Michael Feldman and Dr. Curtis Langlotz actively participate in the In Vivo Imaging Workspace. In vivo imaging from the molecular level to small animal imaging to clinical imaging of patients is an essential component of basic and clinical cancer research. The caBIG(TM) In Vivo Imaging Workspace will focus on identifying the ways in which the wealth of information provided by such imaging, performed at academic and other research centers across the country, can be shared, optimized, and most effectively integrated into the ongoing effort to relieve suffering and death from cancer. The In Vivo Imaging Workspace was launched during the first week of October 2005, and the opportunity for participation remains open. Initial efforts will involve enlisting the widest possible representation from cancer centers, industry, organizations, and standards-setting groups. Among the earliest of the cooperative workspace tasks will be the identification of overall aims and the most urgent challenges in cancer imaging and sharing of data. The workspace will define the needs for and participate in creating, optimizing, and validating tools and methods to extract meaning from in vivo imaging data. In this process, participants will also be actively engaged in defining, refining, and evolving interoperable in vivo imaging informatics data standards. The in vivo imaging technologies and modalities addressed will include systems for research and clinical imaging of live patients and animals (including single-cell organisms) used as model systems for human disease.

Strategic Planning Workspace

Craig Street, MS, and David Birtwell from the BMIF group both participate in the Strategic Planning Workspace. This Workspace assists the caBIG(TM) Oversight Board with strategic planning and vision development activities. Participants provide strategic insights regarding caBIG(TM)s potential role, relationship and interface with other initiatives. The products of these endeavors include white papers and planning documents that help identify and prioritize additional activities for the caBIG(TM)'s project as a whole.