[Metric] BLEU(Bilingual Evaluation Understudy)

1. BLEU

- Bilingual Evaluation Understudy

- n-gram precision의 조화평균(작은 값에 가중치)

2. BLEU를 왜 사용할까?

1. 자연어 생성 평가에서 중요한 것은 생성된 문장이 목적에 얼마나 잘 부합하는지 ( 의미있는지) 이기 때문에, 단어 한 두개의 차이는 중요하지 않음

2. 문장의 길이와 단어의 중복을 고려해 평가할 수 있음

3. Brevity Penalty 을 사용해 짧은 문장의 점수가 높아지는 문제 해결

3. BLEU score 계산

Reference text : "Where is the cat sleeping?"
Generated text : "Why is the cat sleeping on the mat?"

1-gram precision (=4/8)

Reference : (Where), (is), (the), (cat), (sleeping)
Generated : (Why), (is), (the), (cat), (sleeping), (on), (the), (mat)

2-gram precision (=3/7)

Reference : (Where, is), (is, the), (the, cat), (cat, sleeping)
Generated : (Why, is), (is, the), (the, cat), (cat, sleeping), (sleeping, on), (on, the), (the, mat)

3-gram precision (=2/6)

Reference : (Where, is, the), (is, the, cat), (the, cat, sleeping)
Generated : (Why, is, the), (is, the, cat), (the, cat, sleeping), (cat, sleeping, on), (sleeping, on, the), (on, the, mat)

4-gram precision (=1/5)

Reference : (Where, is, the, cat), (is, the, cat, sleeping)
Generated : (Why, is, the, cat), (is, the, cat, sleeping), (the, cat, sleeping, on), (cat, sleeping, on, the), (sleeping, on, the, mat)

쏘야로그

[Metric] BLEU(Bilingual Evaluation Understudy)

1. BLEU

2. BLEU를 왜 사용할까?

3. BLEU score 계산

티스토리툴바