知方号

知方号

LIDC

LIDC

The Cancer Imaging ArchiveLIDC-IDRI | Data from The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A completed reference database of lung nodules on CT scans

DOI: 10.7937/K9/TCIA.2015.LO9QL9SX | Data Citation Required | Image Collection

LocationSpeciesSubjectsData TypesCancer TypesSizeSupporting DataStatusUpdatedChestHuman1,010CT, DX, CRLung Cancer133.16GBClinical, Image Analyses, Software/Source CodePublic, Complete2023/09/21Summary

The Lung Image Database Consortium image collection (LIDC-IDRI) consists of diagnostic and lung cancer screening thoracic computed tomography (CT) scans with marked-up annotated lesions. It is a web-accessible international resource for development, training, and evaluation of computer-assisted diagnostic (CAD) methods for lung cancer detection and diagnosis. Initiated by the National Cancer Institute (NCI), further advanced by the Foundation for the National Institutes of Health (FNIH), and accompanied by the Food and Drug Administration (FDA) through active participation, this public-private partnership demonstrates the success of a consortium founded on a consensus-based process.

Seven academic centers and eight medical imaging companies collaborated to create this data set which contains 1018 cases.  Each subject includes images from a clinical thoracic CT scan and an associated XML file that records the results of a two-phase image annotation process performed by four experienced thoracic radiologists. In the initial blinded-read phase, each radiologist independently reviewed each CT scan and marked lesions belonging to one of three categories ("nodule > or =3 mm," "nodule or =3 mm"). In the subsequent unblinded-read phase, each radiologist independently reviewed their own marks along with the anonymized marks of the three other radiologists to render a final opinion. The goal of this process was to identify as completely as possible all lung nodules in each CT scan without requiring forced consensus.

Note : The TCIA team strongly encourages users to review pylidc and the Standardized representation of the TCIA LIDC-IDRI annotations using DICOM (DICOM-LIDC-IDRI-Nodules) of the annotations/segmentations included in this dataset before developing custom tools to analyze the XML version. 

Data AccessVersion 4: Updated 2023/09/21

9/21/2023 Maintenance notes: corrected inadvertent inclusion of third-party-generated files in primary-data download manifest

TitleData TypeFormatAccess PointsSubjectsStudiesSeriesImagesLicenseImagesCT, DX, CRDICOMDownload (133.15gb) Search Download requires NBIA Data Retriever1,0101,3081,308244,527CC BY 3.0 DICOM Metadata DigestCSVDownload (313.96kb) CC BY 3.0 Radiologist Annotations/Segmentations (Note: see pylidc for assistance using these data)XML and ZIPDownload (8.62mb) CC BY 3.0 Nodule Counts by PatientXLSXDownload (40.35kb) CC BY 3.0 Patient DiagnosesXLSDownload (45kb) CC BY 3.0 Analysis Results Using This Collection Pulmonary-Nodules-Segmentation QIN-LungCT-Seg Image-Compression-Simulation DICOM-LIDC-IDRI-Nodules Radiomic-Feature-Standards SAROS Related Datasets Pulmonary-Nodules-Segmentation QIN-LungCT-Seg Image-Compression-Simulation DICOM-LIDC-IDRI-Nodules Radiomic-Feature-Standards SAROS No related Collections foundLegend: Analysis Results| Collections

Additional Resources for this Dataset

The following external resources have been made available by the data submitters.  These are not hosted or supported by TCIA, but may be useful to researchers utilizing this collection.

Nodule Size ListSee pylidc for assistance using the XML data

The NCI Cancer Research Data Commons (CRDC) provides access to additional data and a cloud-based data science infrastructure that connects data sets with analytics tools to allow users to share, integrate, analyze, and visualize cancer research data.

Imaging Data Commons (IDC) (Imaging Data)

Citations & Data Usage Policy

Data Citation Required: Users must abide by the TCIA Data Usage Policy and Restrictions. Attribution must include the following citation, including the Digital Object Identifier:

Data Citation

Armato III, S. G., McLennan, G., Bidaut, L., McNitt-Gray, M. F., Meyer, C. R., Reeves, A. P., Zhao, B., Aberle, D. R., Henschke, C. I., Hoffman, E. A., Kazerooni, E. A., MacMahon, H., Van Beek, E. J. R., Yankelevitz, D., Biancardi, A. M., Bland, P. H., Brown, M. S., Engelmann, R. M., Laderach, G. E., Max, D., Pais, R. C. , Qing, D. P. Y. , Roberts, R. Y., Smith, A. R., Starkey, A., Batra, P., Caligiuri, P., Farooqi, A., Gladish, G. W., Jude, C. M., Munden, R. F., Petkovska, I., Quint, L. E., Schwartz, L. H., Sundaram, B., Dodd, L. E., Fenimore, C., Gur, D., Petrick, N., Freymann, J., Kirby, J., Hughes, B., Casteele, A. V., Gupte, S., Sallam, M., Heath, M. D., Kuhn, M. H., Dharaiya, E., Burns, R., Fryd, D. S., Salganicoff, M., Anand, V., Shreter, U., Vastagh, S., Croft, B. Y., Clarke, L. P. (2015). Data From LIDC-IDRI [Data set]. The Cancer Imaging Archive. https://doi.org/10.7937/K9/TCIA.2015.LO9QL9SX

Acknowledgement

Please be sure to include the following attribution in any publications or grant applications along with references to appropriate LIDC publications:

“The authors acknowledge the National Cancer Institute and the Foundation for the National Institutes of Health, and their critical role in the creation of the free publicly available LIDC/IDRI Database used in this study.”

Detailed Description

Reader Annotation and Markup

These links help describe how to use the .XML annotation files which are packaged along with the images in The Cancer Imaging Archive.  The option to include annotation files in the download is enabled by default, so the XML described here will be included when downloading the LIDC-IDRI images unless you specifically uncheck this option.  If you are only interested in the XML files or you have already downloaded the images you can obtain them here:

LIDC-XML-only.zip

The following documentation explains the format and other relevant information about the XML annotation and markup files:

XML File DocumentationXML Base Schema (xsd format in zip)- This file is called “voi array.xsd”, and is central in defining tumors greater than or equal 3 mm in the datasets as well as defining the loci of non-nodules.Annotated XML FileLIDC Radiologist Instructions for Spatial Location and Extent Estimates

Annotation and Markup Issues/Comments

For a subset of approximately 100 cases from among the initial 399 cases released, inconsistent rating systems were used among the 5 sites with regard to the spiculation and lobulation characteristics of lesions identified as nodules > 3 mm. The XML nodule characteristics data as it exists for some cases will be impacted by this error. We apologize for any inconvenience.Also note that the XML files do not store radiologist annotations in a manner that allows for a comparison of individual radiologist reads across cases (i.e., the first reader recorded in the XML file of one CT scan will not necessarily be the same radiologist as the first reader recorded in the XML file of another CT scan).March 2010: Contrary to previous documentation, the correct ordering for the subjective nodule lobulation and nodule spiculation rating scales stored in the XML files is 1=none to 5=marked. The issue of consistency noted above still remains to be corrected.On 2012-03-21 the XML associated with patient LIDC-IDRI-0101 was updated with a corrected version of the file.Per May 2018, Please note that errors exist for two xml files, 044.xml and 191.xml, where one reader recorded one nodule as a “nodule >= 3 mm” but neglected to assign ratings for the nodule characteristics. On June 28, 2018 the files were updated with an explanation at the point of the error in the XML files.Subject LIDC-IDRI-0396 (139.xml) had an incorrect SOP Instance UID for position 1420. This was fixed on June 28, 2018.Subject LIDC-IDRI-0510 has an assigned value of 5 for the internalStructure attribute in 187/255.xml. There is no 5th category for internalStructure so this should be considered invalid.There are 8 patients in the collection with two different timepoint CT scans.  We realized this after completing the LIDC-IDRI project (our intent was just to have a single timepoint for any one patient).  Users are free to use either scan (or both scans).Nodule-Specific DetailsNodule size list for the LIDC public cases – This link provides a list of available cases and the associated size of each identified nodule.lidc-idri nodule counts (6-23-2015).xlsx – This link provides an accounting of the total number of nodules for each LIDC-IDRI patient.Diagnosis Data

For a limited set of cases, LIDC sites were able to identify diagnostic data associated with the case.

tcia-diagnosis-data-2012-04-20.xlsNote:  This project has concluded and we are not able to obtain any additional diagnosis data beyond what is available in the above link.

Data was collected for as many cases as possible and is associated at two levels:

Diagnosis at the patient level (diagnosis is associated with the patient)Diagnosis at the nodule level (where possible)

At each level, data was provided as to whether the nodule was:

Unknown (no data is available)Benign or non-malignant diseaseA malignancy that is a primary lung cancerA metastatic lesion that is associated with an extra-thoracic primary malignancy

For each lesion, there is also information provided as to how the diagnosis was established including options such as:

unknown – not clear how diagnosis was establishedreview of radiological images to show 2 years of stable nodulebiopsysurgical resectionprogression or responseSoftwarepylidc

pylidc  is an  Object-relational mapping  (using  SQLAlchemy ) for the data provided in the  LIDC dataset .  Some of the capabilities of pylidc  include query of LIDC annotations in SQL-like fashion, conversion of  the nodule segmentation contours into voxel labels, and visualization o f segmentations as image overlays.  If you find this tool useful in your research please cite the following paper:

Citation

Matthew C. Hancock, Jerry F. Magnan.  Lung nodule malignancy classification using only radiologist quantified image features as inputs to statistical learning algorithms: probing the Lung Image Database Consortium dataset with two statistical learning methods.  SPIE Journal of Medical Imaging. Dec. 2016.  https://doi.org/10.1117/1.JMI.3.4.044504

MAX

MAX (“multi-purpose application for XML”) performs nodule matching and pmap generation based on the XML files provided with the LIDC/IDRI Database. It also performs certain QA and QC tasks and other XML-related tasks.

MAX is written in Perl and was developed under RedHat Linux. It has been run under Windows.

Downloading MAX and its associated files implies acceptance of the following notice (also available here and in the distro as a text file):

DISCLAIMER: MAX is not guaranteed to process all input correctly. Possible errors include (but are not limited to) the inability to process correctly some types of nodule ambiguity (where nodule ambiguity refers to overlap between nodule markings having complicated shapes or to overlap between a nodule marking and a non-nodule mark).

Download the distro (max-V107.zip) ; view/download  ReadMe.txt  (a text file that is also included in the distro).

LIDC 2 Image Toolbox (Matlab)

This tool is a community contribution developed by Thomas Lampert.  It is designed for extracting individual annotations from the XML files and converting them, and the DICOM images, into TIF format for easier processing in Matlab (LIDC-IDRI) dataset).  It is available for download from: https://sites.google.com/site/tomalampert/code.

Related PublicationsPublications by the Dataset Authors

The authors recommended this paper as the best source of additional information about this dataset:

Armato SG 3rd, McLennan G, Bidaut L, McNitt-Gray MF, Meyer CR, Reeves AP, Zhao B, Aberle DR, Henschke CI, Hoffman EA, Kazerooni EA, MacMahon H, Van Beeke EJ, Yankelevitz D, Biancardi AM, Bland PH, Brown MS, Engelmann RM, Laderach GE, Max D, Pais RC, Qing DP, Roberts RY, Smith AR, Starkey A, Batrah P, Caligiuri P, Farooqi A, Gladish GW, Jude CM, Munden RF, Petkovska I, Quint LE, Schwartz LH, Sundaram B, Dodd LE, Fenimore C, Gur D, Petrick N, Freymann J, Kirby J, Hughes B, Casteele AV, Gupte S, Sallamm M, Heath MD, Kuhn MH, Dharaiya E, Burns R, Fryd DS, Salganicoff M, Anand V, Shreter U, Vastagh S, Croft BY.  The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A completed reference database of lung nodules on CT scans. Medical Physics, 38: 915–931, 2011. DOI: https://doi.org/10.1118/1.3528204

The Collection authors suggest the below will give context to this dataset:

Hancock, MC, Magnan, JF.  Lung nodule malignancy classification using only radiologist quantified image features as inputs to statistical learning algorithms: probing the Lung Image Database Consortium dataset with two statistical learning methods.  SPIE Journal of Medical Imaging. Dec. 2016.  https://doi.org/10.1117/1.JMI.3.4.044504Publication Citation

Armato SG 3rd, McLennan G, Bidaut L, McNitt-Gray MF, Meyer CR, Reeves AP, Zhao B, Aberle DR, Henschke CI, Hoffman EA, Kazerooni EA, MacMahon H, Van Beeke EJ, Yankelevitz D, Biancardi AM, Bland PH, Brown MS, Engelmann RM, Laderach GE, Max D, Pais RC, Qing DP, Roberts RY, Smith AR, Starkey A, Batrah P, Caligiuri P, Farooqi A, Gladish GW, Jude CM, Munden RF, Petkovska I, Quint LE, Schwartz LH, Sundaram B, Dodd LE, Fenimore C, Gur D, Petrick N, Freymann J, Kirby J, Hughes B, Casteele AV, Gupte S, Sallamm M, Heath MD, Kuhn MH, Dharaiya E, Burns R, Fryd DS, Salganicoff M, Anand V, Shreter U, Vastagh S, Croft BY.  The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A completed reference database of lung nodules on CT scans. Medical Physics, 38: 915–931, 2011. DOI: https://doi.org/10.1118/1.3528204

Research Community Publications

TCIA maintains a list of publications which leverage our data. If you have a manuscript you’d like to add please contact TCIA’s Helpdesk.

 

Additional Publications Related to this Work

The Collection authors suggest the below will give context to this dataset:

Hancock, MC, Magnan, JF.  Lung nodule malignancy classification using only radiologist quantified image features as inputs to statistical learning algorithms: probing the Lung Image Database Consortium dataset with two statistical learning methods.  SPIE Journal of Medical Imaging. Dec. 2016.  https://doi.org/10.1117/1.JMI.3.4.044504Other Publications Using this Data

TCIA maintains a list of publications which leverage our data. If you have a manuscript you’d like to add please contact TCIA’s Helpdesk.

 

Previous VersionsVersion 3: Updated 2015/07/27

*Replace any manifests downloaded prior to 2/24/2023. Please download a new manifest by clicking on the download button in the Images row of the table above. Manifests downloaded prior to 2/24/2023 may not include all series in the collection.

Prior to 7/27/2015, many of the series in the LIDC-IDRI collection, had inconsistent values in the DICOM Frame of Reference UID, DICOM tag (0020,0052).  Each image had a unique value for Frame of Reference (which should be consistent across a series).  This has been corrected.  In addition, the following tags, which were present (but should not have been), were removed: (0020,0200) Synchronization Frame of Reference, (3006,0024) Referenced Frame of Reference, and (3006,00c2) Related Frame of Reference.

TitleData TypeFormatAccess PointsSubjectsStudiesSeriesImagesLicenseImagesDICOMDownload (125gb) Search Download requires NBIA Data Retriever DICOM Metadata DigestCSVDownload (313.96kb) Radiologist Annotations/SegmentationsXMLDownload (8.62mb) Nodule Size ListWEBSearch Nodule Counts by PatientXLSDownload (40.35kb) Patient DiagnosesXLSDownload (45kb) Version 2: Updated 2012/03/21

On 2012-03-21 the XML associated with patient LIDC-IDRI-0101 was updated with a corrected version of the file. The  old version is still available  if needed for audit purposes.

TitleData TypeFormatAccess PointsSubjectsStudiesSeriesImagesLicenseVersion 1: Updated 2011/06/23

There was a “pilot release” of 399 cases of the LIDC CT data via the NCI CBIIT installation of NBIA . The LIDC-IDRI collection contained on TCIA is the complete data set of all 1,010 patients which includes all 399 pilot CT cases plus the additional 611 patient CTs and all 290 corresponding chest x-rays. A table which allows  mapping between the old NBIA IDs and new TCIA IDs  can be downloaded for those who have obtained and analyzed the older data.

  For a subset of approximately 100 cases from among the initial 399 cases released, inconsistent rating systems were used among the 5 sites with regard to the spiculation and lobulation characteristics of lesions identified as nodules > 3 mm. The XML nodule characteristics data as it exists for some cases will be impacted by this error. We apologize for any inconvenience.

  Contrary to previous documentation (prior to March 2010), the correct ordering for the subjective nodule lobulation and nodule spiculation rating scales stored in the XML files is 1=none to 5=marked. The issue of consistency noted above still remains to be corrected.

TitleData TypeFormatAccess PointsSubjectsStudiesSeriesImagesLicense

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容, 请发送邮件至lizi9903@foxmail.com举报,一经查实,本站将立刻删除。