To generate NLP features, parse the EMR narrative notes to identify and count positive mentions of all CUIs in the dictionary using the NILE.
As an example, obtain the “Training: RiskFactors Complete Set 1 MAE” data under “2014 De-identification and Heart Disease Risk Factors Challenge” from the i2b2 NLP Research Data Sets.
Use xml_Utils.java to extract notes from downloaded xml files.
Then use the dictionary CAD_dict.txt generated from MetaMap and parse these notes using NILE.
Results from processing the “Training: RiskFactors Complete Set 1 MAE” data can be found in NLP2014_set1_res.txt.