GitHub Repository
Created On
Updated On
This app removes Protected Health Information (PHI) from user-uploaded clinical notes. It uses Philter, an open-source software designed to accurately and securely de-identify free-text clinical notes. It uses a combination of rule-based and statistical natural language processing (NLP) approaches to remove PHI from clinical documents. It can handle a wide range of PHI types and maintains the original structure of the text by replacing PHI with obfuscated tokens (series of asterisks) of the same length.

More details

Use Cases Limitations Evidence Owner's Insight
  1. Research: Facilitates the safe use of clinical notes for research by removing PHI, thereby preserving patient privacy and complying with HIPAA regulations.
  2. Data Sharing: Allows healthcare providers to share clinical notes with other entities, such as researchers or partner institutions, without compromising patient confidentiality.
  3. Clinical Data Analysis: Helps in extracting and analyzing data from clinical notes for various healthcare studies without the risk of exposing PHI.
  4. Healthcare AI Development: Supports developers in creating AI models that can learn from large datasets of clinical notes without accessing sensitive information.
  • The tool may not identify PHI presented in a highly unusual context or format.
  • When the authors of the paper "Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes" evaluated it, it removed 99.4% of PHI in the clinical notes. While impressive, Philter is not perfect and can miss edge cases. Also, it's susceptible to falsely marking non-PHI words as PHI.
  • The effectiveness of Philter depends on the comprehensiveness of its underlying dictionaries and patterns used to detect PHI, which may need updates to remain current.

In studies comparing Philter to other de-identification tools, Philter demonstrated the highest overall recall on multiple test corpora. It was evaluated on a corpus of 2000 notes where it achieved a recall of 99.46%, outperforming other algorithms in each category of PHI.

See "Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes" for the paper the Philter library is based on, and for the code.

The development team emphasizes Philter's high recall rates, customization, and the importance of community involvement in improving the tool. They highlight its efficiency, privacy-centric approach, and potential in enabling broader use of clinical notes for research. The team also points out the tool's ability to operate in secure environments without an internet connection, ensuring data privacy.

With these details, we can see Philter's strong performance in de-identifying PHI from clinical notes, making it a valuable tool for researchers and healthcare providers who handle sensitive patient data.

Peer reviewed

Warning: This application or model has been peer reviewed, but still may occasionally produce unsafe outputs.

  • Favorites: 1
  • Executions: 33

  • Natural Language Processing


Venkata Chengalvala

Member since