Parameter (Model Weight)
The learnable values within a neural network that are adjusted during training to minimize the loss function. Model size is typically measured in parameter count.
What Parameters Are
Parameters are the numbers that define what a model has learned. In a neural network, they are the weights and biases of each layer. A 70B parameter model has 70 billion individual numbers that were optimized during training.
Parameters vs. Hyperparameters
Parameters are learned automatically during training (weights, biases). Hyperparameters are set manually before training (learning rate, batch size, architecture choices). This is a crucial distinction.
Scale
GPT-3: 175B parameters. GPT-4: estimated 1.8T parameters (MoE). LLaMA 3: 8B to 405B. The relationship between parameter count and capability is complex -- architecture, training data, and training compute all matter.