Академический Документы
Профессиональный Документы
Культура Документы
Recipes
Debasish Bose
Affiliation not available
May 16, 2014
Abstract
The science of nutrition deals with all the effects on people of any
component found in food. This starts with the physiological and biochemical processes involved in nourishment how substances in food
provide energy or are converted into body tissues, and the diseases that
result from insufficiency or excess of essential nutrients (malnutrition).
The role of food components in the development of chronic degenerative
disease like coronary heart disease, cancers, dental caries, etc., are major targets of research activity nowadays. There is growing interaction
between nutritional science and molecular biology (esp. nutrigenomics)
which may help to explain the action of food components at the cellular
level and the diversity of human biochemical responses. However in our
daily lives we cook recipes made of ingredients, instead of focusing on raw
food components. Beyond dietitians advice and guidelines, its difficult
to continuously measure our daily nutritional intake, without manually
entering weight and amount of each constituent ingredients. Apart from
this manual process, effective nutritional intake also depends on the cooking process, retention factors of the individual ingredients. To alleviate
such difficulties we propose an algorithm and an accompanying web-based
tool to automatically extract nutritional information from any text-based
recipes
Introduction
2
2.1
Procedure
Information Extraction
2.2
Ontology Mapping
food synonym , k e y w o r d r e p e a t , s n o w b a l l , u n i q u e s t e m ] ,
type
=> custom
}
Out of the various filters configured, food_synonym is most important
for the mapping process. This filter uses a partially-autogenerated (Many food
items in USDA database has common names) file filled with frequently occurring
words in the recipe domain and their equivalent or synonymous words in the
indexed data (USDA data). For example (using Solr synonym format)
a u b e r g i n e => e g g p l a n t
s i l v e r b e e t => chard
g h e r k i n => p i c k l e
s u l t a n a s => r a i s i n s g o l d e n
Using the above file (food_synonym.txt) the food_synonym filter is built as
food synonym => {
type => synonym ,
synonyms path =>
R a i l s . r o o t . j o i n ( c o n f i g , food synonym . t x t ) . t o s
}
We use a Fuzzy Like This Query (FLTQ) using the raw ingredient obtained
in the extraction process and a specific configuration of max_query_terms and
fuzziness parameters. This query fuzzifies all terms provided as strings and
then picks the best and differentiating terms. In effect this mixes the behaviour
of FuzzyQuery and MoreLikeThis but with special consideration of fuzzy scoring
factors. Instead of using a single analyzed index field, we also store an additional index field (description.simple) which doesnt process the tokenized
stream using a snowball filter (stemmer). This increases the precision of our
query and overall mapping process.
i n d e x e s : d e s c r i p t i o n , : type => m u l t i f i e l d , : f i e l d s => {
s i m p l e => { : type => s t r i n g ,
: a n a l y z e r => f o o d a n a l y z e r s i m p l e } ,
snowed => { : type => s t r i n g ,
4
: a n a l y z e r => f o o d a n a l y z e r s n o w e d }
}
To optimize the overall mapping process we use ElasticSearchs Multi-Search
API to map all ingredients of a given recipe to their respective food item nodes
in the ontology.
2.3
Lexical Clues
After the parsing and mapping phase, one critical step is to determine the overall
weight of each ingredient. This is perhaps the most complex step in the whole
process and the subsequent nutritional calculation heavily depends upon this.
This is complicated by the ingredient listings like
Pinch of salt to taste
Two 15-ounce cans chickpeas (4 cups), rinsed and drained
pinch is a very common kitchen unit and has to be appropriately handled for
weight calculation. Similarly the second ingredient (chickpeas) has weight-hint
given in its description (4 cups). Identifying these lexical clues and incorporating them in the weight deduction is achieved in this step. This critical step is
often overlooked in classical Information Extraction literature and discussed by
[1].
2.4
Nutritional Information
Recommended Dietary Intake (RDI) is consulted in order to display the accumulated values of various macro and micro-nutrients of the recipe. There are
two complications
RDI values of some nutrients (ex. Cholesterol, Dietary Fiber etc.) depends
on the total calorie intake
RDI values are complex functions of age (life-stage) and special conditions
or diseases (ex. diabetic)
The proposed system gracefully handles all these and generates a more personalized nutritional annotation for the given recipe. For example, the Creme
Brulee Oatmeal recipe has following nutritional profile
References
[1] Fadi Badra, Sylvie Despres, and Rim Djedidi. Ontology and lexicon: the
missing link. In Workshop Proceedings of the 9th International Conference
on Terminology and Artificial Intelligence, pages 1618, 2011.