Loading...
「ツール」は右上に移動しました。
利用したサーバー: wtserver2
90いいね 1367回再生

BERT (language model)

Learn more at: en.wikipedia.org/wiki/BERT_(language_model))
Content derived and adapted from Wikipedia articles, licensed under CC BY-SA.

BERT (Bidirectional Encoder Representations from Transformers): A transformer-based model pre-trained on a large corpus of text data to understand linguistic context effectively using a deeply bidirectional neural network.

Transformer: A neural network architecture that uses self-attention mechanisms to weigh the significance of different words in a sequence to enhance language processing tasks.

Tokenizer: A component that converts text into a sequence of integers or tokens, enabling models like BERT to process language data.

Embedding: A representation of tokens in a lower-dimensional continuous vector space, facilitating operations on text data within a model.

Encoder: Part of the transformer architecture that processes input data by leveraging self-attention to understand context.

Token Type Embeddings: Vectors that convert one-hot encoded token vectors into dense representations based on their token type.

Position Embeddings: Encodes the position of each token within a sequence using vectors based on sinusoidal functions.

Segment Type Embeddings: Dense vectors that distinguish text segments within input sequences using binary encoding.

LayerNorm: A normalization technique applied to vectors to maintain numerical stability and performance across layers.

Masked Language Model (MLM): A training task where BERT predicts obscured words within a sentence to understand linguistic context bidirectionally.

Next Sentence Prediction (NSP): A training task where BERT evaluates if one sentence logically follows another to capture inter-sentence relationships.

WordPiece: A sub-word tokenization technique used to handle out-of-vocabulary tokens by breaking words into smaller pieces.

[MASK] Token: A special token used during training to replace certain words, challenging the model to predict the missing words using context.

Attention Mechanism: Part of a transformer model that allows it to concentrate on specific parts of input data, crucial for understanding context.

Feed-forward Network: A neural network layer in transformers that processes outputs from attention layers to generate a final representation.

RoBERTa: A BERT variant that improves performance by optimizing hyperparameters, removing certain tasks, and using larger mini-batches.

DistilBERT: A compressed version of BERT that retains most of BERT's performance while reducing model size for efficiency.

TinyBERT: An efficient version of BERT with a significantly reduced parameter count aimed at maintaining strong performance metrics.

ALBERT: A BERT variant that introduces parameter sharing across layers and focuses on differentiating sentence order.

ELECTRA: A BERT-inspired model using a discriminator-generator framework to enhance language model training via adversarial techniques.

DeBERTa: A BERT variant employing disentangled attention to separate positional and token encodings in its attention mechanism.

TPU (Tensor Processing Unit): A hardware accelerator designed by Google to accelerate machine learning workloads specific to neural network training.

BERTBASE: A standard version of BERT consisting of 12 layers and 768 hidden dimensions for general use in language tasks.

BERTLARGE: An enlarged version of BERT with 24 layers and a hidden size of 1024 dimensions, offering enhanced performance.

BERTTINY: A miniaturized version of BERT designed to maximize efficiency with 2 layers and a reduced hidden size.

Global Pooling: A term from computer vision analogous to BERT's "pooler layer," referring to consolidating information into a single output representation.

Attention Weights: Numerical values representing the significance of different words in a sequence, used in the attention mechanism to comprehend context.

Fine-tuning: The process of training a pre-trained model like BERT on a specific task with additional data and task-specific adjustments.

GLUE (General Language Understanding Evaluation): A benchmark for assessing the performance of natural language processing models across multiple tasks.

SQuAD (Stanford Question Answering Dataset): A reading comprehension dataset for evaluating models' question-answering capabilities.

SWAG (Situations With Adversarial Generations): A benchmark for evaluating the coherence and reasoning abilities of models in textual scenarios.

コメント